I have used two sources of gfortran on windows; mingw-w64 and 64-bit equation.com. ( each alternative requires careful setting of the path environment variable, which may be an issue for this thread?)
What I note is that equation.com’s version produces much larger .exe files than MinGW-W64, which I attribute to use of fewer .dll dynamic links. As a consequence, there is a slower initial startup of the .exe, but less overhead once the program starts. By using timers initiated during the run, this eq… version has slightly faster measured computation performance, as .dll loading is not included in my test run times.
This is an interesting thread, as I have not found (equation.com’s) gfortran on windows to have poor performance, although it is important to note my testing is for a different run profile, where my tests are for intense computation over minutes or hours, not fractions of a second, where the startup time is significant. You only connect the .dll’s once then this delay is not repeated.
Most of the comparison of Julia to gfortran in this thread appears to focus on the startup and some intrinsics, while I am more focused on multi-threaded AVX computation. I can’t conceive that Julia would be faster than gfortran for the types of computation I am doing, but there are always going to be types of computing that suit a particular language.
I reviewed the code in post #9 to see if I could identify types of coding that may not suit gfortran.
- lots of tab characters in the code, which is not portable and made it difficult to test with other compiler tools I use.
- I am not familiar with ishft (i,j), especially where i and j are different kinds. I suspect “j” should be a standard integer?
- auto-allocate is used, eg “qt1 = mu01 + sig01*gaussian(nsub1)“, although this is not a significant cpu time usage.
- subroutine steptest uses “do concurrent( i=1:nsub, k=1:kmix )”, although this is not a significant cpu time usage. Not sure why this is adopted ?
However, from my win64 > equation.com:gfrotran testing, most of the time is consumed in Function pYq_i_detail. (called 10 million times). This is provided as an external function argument to subroutine MC_gauss_ptheta_w_sig, which is called via subroutine prep.
Changing from being used as a supplied function argument to an explicit function use, this did not change the performance.
It uses intrinsic exp and **2 and does not appear to utilise avx.
Perhaps Julia has a better exp implementation ?