Thanks @RonShepard .
In the kxk code, changed the
n=1000
to
n=5000
Since it seems n=1000 is too small to give accurate results on my laptop.
For n=5000, with Intel OneAPI, below is what I got,
c11= 2.90899748439952 cpu_time= 8.60937500000000 GFLOPS=
29.0381125226860
c11= 2.90899748439952 cpu_time= 5.17187500000000 GFLOPS=
48.3383685800604