Performance of vectorized code in ifort and ifx

Since in the code I use the matmul intrinsic function to multiply two matrices, I was looking into ways of optimizing this (without invoking the subroutine dgemm, since I would like to keep the code at a basic level). So I came across this excellent post by @zaikunzhang. One suggestion given in the comments to that post by @ivanpribec was to compile with

ifort -O3 -xHost -qopenmp -qmkl=parallel -heap-arrays 40 -qopt-matmul test_matmul.f90

In my example I am compiling with these flags:

SWITCH = /O3 /fast /Qipo /Qmkl=parallel /Qopenmp /Qparallel -qopt-matmul

The problem is that ifort does not recognize -qopt-matmul. In particular, it gives this warning:

ifort: command line warning #10006: ignoring unknown option ‘/qopt-matmul’

I am a bit puzzled by this…