I noticed recently at least there are two threads,
talked about big matrix and/or array operations.
I mean, for those operations, you can either using vectorization yo do matrix operations, or write your own code and do matrix operations using big do loops. But in either case, the bottleneck is actually the speed of memory, is it?
In short, my question, in those memory operation heavy code, does the speed (bandwidth) of memory matter?
PS.
I mean, like, if CPU’s speed can process 200 GB/s double precision numbers, however memory can only provide 20 GB/s. If the code heavily depend on memory operations, then it seems by increasing the speed (bandwidth) of memory is very profitable.
That seems is what Apple M1 and its memory is doing,
See here,
It seems M1 family’s memory bandwidth (speed) is much bigger than some PC (such as mine which Xeon 2186M + DDR4 2666 ECC which only have 35GB/s or so bandwidth, even the M1 in Macbook Air is 2X faster than mine in terms of memory speed).
It also reminds me that High-Bandwidth-Memory (HBM) is an importance task in HPC. I remember AMD and Intel both implemented HBM technique in some way. AMD now have big cache in their chip. Intel had Xeon Phi Knights Landing (KNL) with HBM (Intel call it MCDRAM or something like that, I have the luck of using 5120 Xeon Phi cores on the cluster which I can use, now Intel give up Phi and put more effort on GPU computing I guess).