Why is my code compiled with GFortran on Windows slower than on Ubuntu?

@CRquantum you can try to gather together at most I/O ops in your code. That is something that slows down everything.

But I would advise you to avoid running on windows, if you can.

1 Like

Thanks @conradoat .
Eh, yeah, I have turned off I/O completely (the only write to a file, is just at the end when computation is done, it write to a very small file). The timing did not improved.

Now, the only I/O I have is just write(6,*) which is write to the screen.
Eh, are you suggesting turn off all the write(6,*)?

I am developing a package in Fortran and I hope it can be used on Win/Linux/Mac. Letting the user installing gfortran and Make on windows is a very convenient choice. Otherwise user need install Intel OneAPI which is also great but does not really optimize for Apple M1.

So I think if gfortran works fine on Win/Linux/Mac, it can be great.
But the issue of gfortran I have met on windows is that its performance is several times slower than on Linux.
Also, gfortran/gcc seems still some compatibility issue with M1 chip. For example, if use a pointer to point to a function/subroutine, gfortran seems can only work with -Og flag. With more than that optimization it will cause some error on M1 chip.

Blockquote Eh, are you suggesting turn off all the write(6,*) ?

Yes if you can. But from what you say, it seems that the I/O is not the problem.

Thanks @conradoat .
The IO or wirte(6,*) seems is really not the problem. Even if I turned them all off, timing is the same, on windows gofrtran took 3s, intel took 0.5s.

Have you looked at the assembly language produced? I strongly suspect that the answer is there.

@CRQuantum, I think that many of the statements and conclusions stated in the various posts in this thread are unfair to Gfortran. Not much has been said about which versions of Gfortran were used on Windows, and which emulation layer/DLL support infrastructure was used.

I am attempting to make amends.

A brief scan of your sources made me suspect that it uses 8-byte integers in many places where 4-byte integers would have sufficed. This choice, naturally, affects performance, but I did not want to spend any time to alter this aspect of the code.

I took your Gitlab sources, and commented out most of the WRITE statements, until the program produced just 16 lines of output. Here are the run times on my NUC (small box PC with low power laptop processor i7-10710U, laptop memory, on a ramdisk, balanced power setting, Win11-64).

Ifort 2021.5, /O2            : 0.55 s
-same-, but  /fast           : 0.38 s

Cygwin Gfortran 11.2, -O2    : 0.79 s
Eq.com Gfortran 12.0, -O2    : 2.76 s 

I strongly suspected that the difference in the Gfortran times is attributable to Cygwin using (naturally) Cygwin1.DLL rather than the MinGW DLLs. If so, most of the slowdown that you noticed is attributable not to the compiler but to the runtime (which is also used by GCC, G++, etc.). I ran a test to settle my suspicion. I used Cygwin Gfortran to produce .o files, and linked the .o files using the MinGW Gfortran. The resulting a.exe took 2.8 s. This proves that there can be drastic differences in run times of Fortran programs depending on which versions of the GCC RTL are used. We may note in this connection that MinGW was last updated in 2013, whereas Cygwin was updated just two months ago.

You also wrote that Gprof failed to give you profile output. Here is part of the output that I obtained from Cygwin Gprof:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 38.24      0.26     0.26 10198404     0.00     0.00  __samplers_MOD_pyq_i_o
 35.29      0.50     0.24                             _mcount_private
  8.82      0.56     0.06  3999800     0.00     0.00  __random_MOD_randn
  5.88      0.60     0.04                             __fentry__
  2.94      0.62     0.02      208     0.10     0.10  __random_MOD_gaussian
  2.94      0.64     0.02      100     0.20     0.50  __samplers_MOD_metroplis_gik_k_more_o_log
  2.94      0.66     0.02       51     0.39     5.87  __samplers_MOD_prep
  2.94      0.68     0.02                             exp

If you skip the line for _mcount, which is the profiling routine itself, you may note that the biggest time consumed is in function PYQ_I, and that function accounts for over a third of the run time.

You may attempt to modify your code to reduce the number of calls to PYQ_I, or to make it a vector function instead of a scalar, if that is feasible. Secondly, as I mentioned earlier, see if you can be more judicious in using 8-byte integer variables.

4 Likes

Optimize? Intel OneAPI Fortran compilers generate code for X86/X64 processors. They do not target ARM at all.

I have not used an Apple computer for years. From what I have read, Apple M computers provide an emulator/translator/converter “Rosetta-?” that allows running executables that target Apple’s older X64 computers.

1 Like

Thank you very much @mecej4 !
I totally agree with you. I do believe it is caused by some dll stuff in windows caused the problem. But I just do not know how to fix. Do you know how to fix that?

I am the only one in the place I work insisting using Fortran.
Before I came, they almost want to give up Fortran and switch to Julia (I have a Julia version of the code as well, but it is 3 times slower, on windows Julia costs 1.5s. But still better than gfortran’s 3s).
They told me that, after many years of wrestling with gfortran, they are a little tired of gfortran. Because different users have different versions gfortran, the same code sometimes work on some versions of gfortran, sometimes not. The performance of gfortran on windows is not consist with on Mac or Linux. You know, as a user, they do not care and do not know the problem is caused by DLL or gofrtran. All they know is that they installed gfortran, and compile and build the code, and the code is slow. So they will just blame gfortran.
I thought they have problems with gfortran, that is perhaps their code is not written good enough, and I never thought gfortran have any problems in any way.
I showed them fast algorithm using modern Fortran with intel OneAPI, now they are considering Fortran again. Especially because Intel’s OneAPI is free now and works fine on windows and Linux.
Again, I want to make code perform consistent on WIn/Linux/Mac, I see gfortran has the potential to be a very good lightweight choice for all the three platforms.
But the slowness problem currently I have with gfortran on windows, really and deeply bothers me very much.

The gfortran I am using is from equation.com, gfortran 11.2.0
Fortran, C, C++ for Windows

I see you use cygwin64 on windows 10, and that gfortran cost 0.79s, that is about right.
I wonder how did you achieve that?
I mean I installed cygwin64 and installed gfortran 11.2.0 there, then I compile and run my code there, but I still get slow speed like 2.9s,

The real problem is, how to make my code on windows, using gfortran, roughly the same speed as gfortran run on Ubuntu? Do you know how to fix that? What gfortran for windows should I use? I have installed all the possible gfortran version on windows, and it seems they just perform 6x slower on windows than on Linux.
I really hope there is a way to make gfortran’s speed is consistent on windows and on ubuntu.

Thank you very much indeed!

PS

Thank you for your suggestions of optimization. Eh, the integer 8 does not really matter I believe, on modern hardware.
On windows and ubuntu, no matter integer 4 or 8, Intel fortran gives the same speed 0.5s.
gfortran on Ubuntu also gives about 0.6s no matter integer 4 or 8, which is decent enough.
It is an illustration code and optimization is not very important but thank you very much all the same, I highly appreciate it!

If you have multiple versions of Gfortran installed along with MinGW, Cygwin, WSL, etc., you have to be careful not to get the paths and environments mixed up. Here is how to check.

After building an EXE, run the Cygwin ldd utility on it. For the version compiled with Cygwin+Gfortran, I see:

T:\LANG\RChen>gfortran -O2 ran.f90 samplers.f90 EM_mix.f90

T:\LANG\RChen>ldd a.exe
        ntdll.dll => /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffabeee0000)
        KERNEL32.DLL => /cygdrive/c/WINDOWS/System32/KERNEL32.DLL (0x7ffabd8a0000)
        KERNELBASE.dll => /cygdrive/c/WINDOWS/System32/KERNELBASE.dll (0x7ffabc940000)
        cyggcc_s-seh-1.dll => /usr/bin/cyggcc_s-seh-1.dll (0x3f7530000)
        cygwin1.dll => /usr/bin/cygwin1.dll (0x180040000)
        cyggfortran-5.dll => /usr/bin/cyggfortran-5.dll (0x3f6e40000)
        cygquadmath-0.dll => /usr/bin/cygquadmath-0.dll (0x3f1d90000)

For the MinGW+Eq.Com.Gfortran built a.exe, ldd reports:

        ntdll.dll => /cygdrive/c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffabeee0000)
        KERNEL32.DLL => /cygdrive/c/WINDOWS/System32/KERNEL32.DLL (0x7ffabd8a0000)
        KERNELBASE.dll => /cygdrive/c/WINDOWS/System32/KERNELBASE.dll (0x7ffabc940000)
        msvcrt.dll => /cygdrive/c/WINDOWS/System32/msvcrt.dll (0x7ffabdf40000)

If, when you run ldd on what you think is a Cygwin built EXE, you see “msvcrt” in the output, that is a sign that your paths are mixed up.

1 Like

Bite your tongue! Don’t blame Gfortran without checking! That was the whole point of my long post.

I just checked the Windows Julia 1.7.2 installation using ldd, and I find that many of Julia’s DLLs are built with MinGW, as well. This leads me to suspect is that if you compare Julia run times on Windows and Linux the Windows versions will come out as significantly slower. MinGW makes it convenient to port Linux applications to Windows, but there appears to be a price to pay.

Just as with Gfortran, it would be unfair to compare Ifort and Julia on Windows and conclude that Julia is slow compared to Fortran, without taking into account the reliance of Julia on MinGW.

Similarly, it would not be quite proper to compare the performance of a Linux application/package such as Julia to a MinGW Windows port of the same package and conclude that “Windows is three times slower than Linux.”

1 Like

Thank you again @mecej4 ! Nice.

I think we are getting closer to fix the problem.

Now here is the thing. I installed cygwin64 from
https://www.cygwin.com/
I did

Install Cygwin by running setup-x86_64.exe

as the website suggests.

As you can see from my desktop, there is a cygwin64 terminal icon there, I double clicked that icon, it seems I should be in cygwin64 terminal enviroment, however, as you said ( again, thank you very much indeed for the ldd trick, that is indeed helpful! ), perhaps my path is mixed up, because I do see my cygwin64 did not generate the one as you show, I have msvcrt as below,

  1. Do you know how to fix the environment variables?
    My path variables are ,

Are you suggesting deleting the path of all other gfortran versions? Such as like gcc/bin/, etc?

  1. By the way, sorry if it is stupid question, is there a way to call the gfortran and Make in cygwin without using the cygwin64 terminal?
    I briefly checked the folders of cygwin64, but I did not find things like gfortran.exe or make.exe.

Thanks much!

PS.
In cygwin64 terminal, the ‘’‘clean’’’ part in make should be the same as if it is in pure linux,

clean:
	rm -f $(EXEC) *\.mod *\.mod0 *\.smod *\.smod0 *\.log *\.o *~

The previous,

clean:
	@del /q /f $(EXEC) *.mod *.obj *~ > nul 2> nul

works for the Make in equation.com’s gcc pack for windows.

Yes, I see the problem in your screenshot that shows %PATH%. It includes the following line:

c:\gcc\libexec\gcc\x86_64-w64-mingw32\11.2.0

You do not have to use the Cygwin Terminal; I rarely use it. You can use Cygwin tools from any CMD or other command shell, if you make the Cygwin /bin and /usr/bin directories accessible through PATH. Regardless of how you go about this, it is ultimately your responsibility to arrange the environment to work correctly for the task that you are currently performing.

If you need more help with Windows %PATH% and other environment settings, you can consult web pages or manuals, or a local expert.

1 Like

It seems that MinGW (I may have confused MingW with MinGW-W64) is constantly updated, and its last update wasn’t in 2013. See:

PS. The difference between MinGW-W64 and MinGW is that MinGW compiles only 32-bit executable programs, while MinGW-W64 compiles 64-bit or 32-bit executable programs.

1 Like

I may have been a bit careless in that I simply looked up this MinGW distribution that seemed reasonable as a source of “toolchains targetting W64”. I do not know which MinGW/MinGW-w64 tools are used by Equation.com and JuliaLang.org to build their compiler distributions.

There is no Julia package available from Cygwin, but they do have distributions that contain “MinGW” in their names/descriptions.

There is a thread “Julia slower in Windows” on JuliaLang-Discourse in which slowdowns have been noticed in Windows versus Linux.

These slowdowns do not seem to be present for Linux packages run under WSL-1, which I have used sometimes. I do not know about WSL-2, and it would be interesting to hear from someone who uses Gfortran or Julia on WSL-2.

in general, the main slowdown Julia has on wsl is file system stuff. my guess for fortran is performance of libm. the windows libm implementations are often subpar, which Julia fixes by not using system libm. (this also has the advantage of better cross platform reproducability()

Thanks for the comment, @oscardssmith.

Before joining this forum, I had barely heard of Julia. Some of the posts in fortran-lang.discourse named Julia as a threat to Fortran, so I searched and read some articles claiming that Julia was xxx times faster than Fortran, etc. I felt that the claims were not credible, and wanted to give Julia a test-drive. I downloaded the 1.7.2 distribution for Windows, and ran a few examples of nonlinear regression.

The Fortran version compiles and links in ~ 2 seconds, and the run takes less than a second. A second run of the EXE takes 0.04 s, since the DLLs are now cached. The Julia version took 25 seconds for the first run, of which 17 s were used to process the directive using PyPlot . A second run while staying within the Julia REPL took 0.4 s. What disappointed me was the inability to compile stable source code into object files, EXEs and libraries and the huge startup time for running any Julia program unless one stays inside the REPL.

4 Likes

Any idea about how the same examples compares in linux?

Thank you @mecej4 !
I did install cygwin64, and gfortran, and make there. Indeed, I got the about the same performance as yours on Windows,

Although still, gfortran’s performance on windows (0.78s) is a bit slower than on Linux (0.6s), that difference is acceptable for now.

Have not tested MinGW-W64 yet, but for now I can say, at least based on my experience up till now, on windows, the gfortran in cygwin64 is the fastest.
Its performance combined with openmpi within cygwin64 still need to be checked.

1 Like

On WSL-2, or hyper-v, or VMware, or native Ubuntu, at least for my this code and some other codes, my experience is that the gfortran’s performance are about the same.

The only ‘issue’, I hope you do not mind me saying so, is just that, I mean, from a user’s point of view, is that, just on Windows, there are many different versions of gfortran (you know, mingw, cygwin, equation.com etc), other than the gfortran in cygwin64, most other versions’ performance are not good enough (I believe it is exactly due to what you said, the DLL issue). If the users (they are not experts in Fortran) did not install the best gfortran version, and they found the code runs slow, they will just complain about gfortran.

However @mecej4 , after some more checking,
I found that while the gfortran in cygwin64 perform good for my this small code,
I have some more complicated code (roughly those PYQ_I results you mentioned are replaced by some results from ODE solvers) and the gfortran in

performs 3X better than cygwin64’s.
:rofl: I am a little puzzled.

I mean from a user’s point of view, it seems the windows versions of gfortran’s perfomance may not be the most consistent. However, gfortran’s performance seems to be consistent on Linux (even on Linux virtual machine) and Mac when works.

But I know Intel OneAPI depend on Visual Studio when building and linking, while gfortran does not depend on Visual Studio. There is perhaps something in Visual Studio did the magic.