Just reposting the graph here from the other thread.
(Full set of slides on the topic can be found here: NHR PerfLab Seminar: A Review of Processor Advances Over the Past Thirty Years)
Just reposting the graph here from the other thread.
(Full set of slides on the topic can be found here: NHR PerfLab Seminar: A Review of Processor Advances Over the Past Thirty Years)
These are peak bandwidth numbers. Should we assume that these are level-1 cache results?
Unfortunately, the author doesn’t state the conditions precisely in the video, nor which test was used. I’m assuming the numbers were measured with likwid-bench since the author is one of it’s creators. The graph above shows the peak Flops for sequential mode (no SIMD instructions).
Running the stream benchmark myself on my 8-core Intel(R) Core™ i7-11700K @ 3.6 Ghz processor (generation Rocket Lake, launched 2021), I get the numbers:
$ likwid-bench -t stream -w S0:10MB
...
MByte/s: 265655.11
...
$ likwid-bench -t stream_avx -w S0:10MB
...
MByte/s: 272272.28
...
(most output has been omitted)
The workload of 10MB fits into the (shared) L3 cache and roughly matches the Ice Lake figures above in terms of bandwidth.
With smaller work loads that fit into the L2 and L1 caches the bandwidth is larger:
$ likwid-bench -t stream_avx -w S0:1MB # 4 MB L2 cache (total)
...
MByte/s: 1063662.52
...
$ likwid-bench -t stream_avx -w S0:320kB # 40 kB per core, each core has 48 kB L1d cache
...
MByte/s: 1969639.55
...
So my guess would be the peak bandwidth is measured for the largest data cache (L3).