Thanks @mecej4 !
Your i7 10710U seems faster than my xeon 2186M, LOL.
Note this is just a sample code (again, for those who do not know, the sample code is here,
Memory bandwidth test code for M1 and PC) from a more complete sequential Monte Carlo code.
In the real code, once the gaussian random number is generated, I will reuse it in each of the SDE calls. The SDE part will be repeated for more than 20 times instead of 5 times here. So in the real code the time cost of random number generating is not the most dominant part. However I checked, if I use the random number generator in Intel’s MKL, the MKL’s RNG is three times faster than the one in the sample code.
The reason I generate the big random number array first (drawback is that it consumes big memory), is to reuse it in each of the rest SDE call. So in each of the rest SDE call, we do not need to regenerate random number anymore (so it saves time). After all, accessing the already generated gaussian random number stored in the memory is faster than generating them on the fly.
But anyway, the sample tests two parts,
first is the speed of generating a large number of gaussian random number and storing them in the memory.
second part is solving the vectorized SDE which seems is memory operation frequent.
About the crash, Intel confirmed that it seems is a compiler bug. I.e., if I have
do concurrent
in the code. Then, when /qopenmp and /heap-arrays are enabled at the same time, the do concurrent part just crash. See below,
Re: Why /heap-arrays0 cause /qopenmp show access violation for do concurrent? - Intel Communities