I really appreciate your endeavor @JohnCampbell ! Thank you so much!
About do concurrent
, I agree.
I use it for the hope that it can really do some parallelization automatically, and perhaps it can make things work in GPU. But intel’s compiler seems have some issue with it, here is a post about the issue and you also replied there
I personally did not find too much performance advantage of do concurrent
, other than it can make the code look more concise perhaps.
Thank you for being so careful
“mgauss_ik” yeah it is just to dynamically adjust the number of samples (for the given i,k) used for Monte Carlo integral like below,
where
n_ik
is actually line 136 in samplers.f90,
“mgauss_ik” is actually not very useful, can just comment line 221 to 224 in samplers.f90 as below,
and just do
mgauss_ik = mgauss
so mgauss_ik will always be a constant which is mgauss. So for each n_ik
the number of Monte Carlo samples are the same as mgauss which is typically 1000.
The reason for “mgauss_ik” is that, say k=2 so 2 gaussian mixing, the total number samples for n_i1 and n_i2 is a fixed number, which is k*mgauss
, if mgauss=1000 and k=2, so k*mgauss=2000. However perhaps n_i1 needs more samples than n_i2, so I may distribute 1500 samples on n_i1, and 500 on n_i2. So “mgauss_i1=1500”, “mgauss_i2=500”, etc. In this way, the total 2000 samples are more efficient distributed on n_i1 and n_i2, instead of just giving 1000 samples for each.
No worry, in short, “mgauss_ik” does not really influence the code and not depend on seed too much. You know, if the result of a Monte Carlo simulation heavily depend on random number seed, then something must be wrong
By the way, how did you get the profile information below?
#### Delta_Sec Summary #### 12
Id Description Elapsed Calls
1 _START 0.0000 1
2 # pYq_i_detail 3.5952 10201
3 INITIALISED Yji 0.0006 1
4 prep > gauss_thetas 0.0961 102
5 prep > MC_gauss_ptheta_w_sig 0.0000 102
6 Metroplis_gik_k_more_o_log 0.4669 50
7 CC Metroplis_gik_all_o_log 0.0679 50
8 CC mgauss_ik(i,k) 0.0006 50
9 steptest report 0.0095 50
10 cpu_time report 0.0011 50
11 ANALYSED 0.0026 1
12 _FINISHED 4.2406 10659
calls to pYq_i_detail = 10198404
Program end normally.
I tried gprof on windows, but it always generate empty prof file, perhaps I will open a new topic asking this question.
Again, thank you so much!