Dear all,
A quick question usually how much GFLOPS did you get from your CPU peak performance?
I have a code use one thread, and I use Intel Advisor did some profile about the code, screenshot as below,
In particular, my CPU peak performance is 62.825 GFLOPS, my code only uses 1.908 GFLOPS which is pretty small.
I guess with MPI I use all the 12 threads perhaps I can use about 1.9*12 = 22 GFLOPS. However, even so, 22 GFLOPS still pretty far from the theoretical 62 GFLOPS performance.
I am curious, guys, how much GFLOPS did you get from your CPU peak performance?
How to reach CPU peak performance as much as possible?
If the code frequently operates big arrays (size is like several GB), the performance will be limited by the memory speed (bandwidth) right?
Thanks much in advance!
PS.
Just realized Apple M1 Mac their memory seems have quite high bandwidth like 200 - 400 GB/s. While my laptop uses DDR4 2666 which only gives like 35 GB/s, which is way slower than Mac’s. I guess M1 Mac benefit a lot from its high bandwidth memory as well.