Testing the performance of `matmul` under default compiler settings

I understand quite well the nontriviality, and I totally agree that more contributions are needed for open-source compilers (a huge thank to anyone who has contributed).

On the other hand, I do believe that commercial vendors like Intel have sufficient resources to move in this direction, and I do not believe that the engineers at MathWorks are much more intelligent/capable than those at Intel.

If MathWorks can optimize MATLAB intrinsic procedures up to a satisfactory level with the help of Fortran, Intel should be able to do it (much) better for the corresponding Fortran intrinsic procedures with the help of Fortran itself.

However, look at the reality again.