Thanks all!
@RonShepard @FedericoPerini .
I run the kxk code using intel fortran with flag -O3 -xHost on my laptop, xeon 2186M, windows 10. I believe I used MKL already. I am currently running other programs so the result is not accurate.
However anyway, it seems the speed I got is way way way much slower than the ones you listed,
c11= -2.31481358685203 cpu_time= 0.328125000000000 GFLOPS= 6.09523809523809
c11= -2.31481358685203 cpu_time= 0.390625000000000 GFLOPS= 5.12000000000000
I am a little puzzled.
Is there anyone get similar results on a non-Mac M1 machine?
Is there ways to improve the result on a non-Mac M1 machine?