Why is my code compiled with GFortran on Windows slower than on Ubuntu?

I have a modern Fortran code.
Just say gfortran, I use the same optimization flags, that is

-Ofast -march=native

The same version of gfortran.

Compile and run on windows 10, it took 3 second.
Compile and run on Ubuntu, it took 0.5 second.
so gfortran is 6 times slower on windows.

Now if I use intel Fortran, or Intel OneAPI, no matter Ubuntu or Windows 10, all give me 0.5 second. So the performance of Intel Fortran is consistent on different OS, but gfortran is not so.

Does anyone met simiar issue for gfortran on windows?
If so, do you know what caused gfortran to be so slow on windows?

The code almost has no I/O, the only IO is at the end, write the results to a tiny txt file. I am sure the problem is NOT caused by IO.

Thing is, given the same code same optimization flags, intel fortran behaves the same for all the OS.
gfortran on windows is 6 times slower than on Ubuntu.
The time are just the running time. The compilation time is excluded.

1 Like

One cannot say why gfortran is slower than Intel on Windows (and slower than gfortran on Linux) for an undisclosed code. Can you post the code or at least a fragment which demonstrates the speed differential?

Polyhedron has tables of Fortran compiler speeds on Windows and Linux for codes that can be downloaded here. On Windows

gfortran 8.1 -O3 -funroll-loops -ffast-math -o

was about half as fast (comparing geometric means) as

Intel Visual Fortran, 2019 Update 5 (AP) /fast /Qparallel /link /stack:64000000

I wonder if using the latest version of each compiler would make a difference. Intel is much faster than other compilers on the mp_pro code.

3 Likes

I have seen significant differences in compilation that are traceable to I/O and file system operations; and run times can be influenced heavily by virus scanning and cacheing. Can you separate out the compile time from the execution time, and on both platforms execute the program multiple times and see if there is a drastic change in run time. As noted, not knowing what the code is doing (I/O bound, lots of dynamic allocation of data, …) makes it hard to say.
For the runtime profiling tools such as gprof are very valuable.

1 Like

Thank you. I have edited the post.
The time is running time only, compile time is excluded. The code almost have no IO.

The problem is caused by, I think, some places in the code is optimized by gfortran on linux. But not so by gfortran on Windows.

Speaking of IO, in fact, it reminds me that, I do notice in windows the file system like NTFS is significantly slower than the linux file system.
It looks like the same NVME SSD can have much higher actual IOPS on linux than on windows.
I guess someone else may have the same experience on windows. That is, even if I use things like RamDisk to do some IO using ram, the speed is almost the same as I do the IO using my SSD.
Such as installing the same MikTex in RamDisk and in SSD, the installation time are almost the same, and I compiling latex files in RamDisk, and on SSD, the time is still the same. I was expecting that if I compile latex files in Ramdisk, the speed will be as fast as hell. But I was wrong, there is no performance gain than running latex on SSD.
The file system of windows is the limitation, not the hardware.

30 years on, the old adage remains the same: develop on Windows when you have to, on Linux when you can.

11 Likes

Thank you very much! @themos
The thing is, we need to develop a package for users in windows, mac, and linux. Therefore I have to consider the performance on windows. :sweat_smile:
Despite gfortran’s performance on windows is not the same as in linux, Intel fortran’s performance is very consistent across these 3 platforms.

This isn’t just a problem with Fortran; I know a number of other languages appear to run less efficiently on Windows. It is hard enough to get a port; if you want performance you will likely use native windows interfaces. Perhaps Windows subsystem for Linux might help.

2 Likes

I second that WSL2 is a good choice. Actually most of my Fortran codes run faster on WSL2 than native Windows environment.

3 Likes

Thank you @R_cubed @han190, yes I know.
Thing is, I am not running the program for my own.
We are develop an R package which will call the Fortran compiler and compile files and run program on the user’s PC. The users just need to install the R package and fortran compiler, and they are just very regular users and it is almost impossible to ask them to install WSL.
So the most easy/straightforward way is just to make the gfortran on windows run as fast as on linux. Or we may simply use intel OneAPI instead.
Also because that we are develop the package, I am seeking easy ways to install Fortran packages on users’ PC. So they do not need to do anything. We install the packages automatically for them, and use the package, compile, and run the program.
You know, like developing an software, and the user only need to click the installation exe file and install and that is all.

Thank you all! @Beliavsky @urbanjost @themos @R_cubed @han190 .
Here is the code,

Very easy, a very small code (many lines are commented actually). On windows, if you installed gfortran and make, then you just need to download the files in one folder, then do

make

then run

rpem.exe

If I use Intel Fortran on windows, it took 0.5s, gfortran tool 3s.
On Ubuntu both intel and gfortran took 0.5s.

I tried to use gprof on windows with gfortran, but it always give me empty out file.

I always find on Ubuntu, gfortran’s performance is similar with Intel Fortran.
But on windows, gfortran, at least for my code, is basically always several times slower than intel Fortran. If someone have the interest to have a quick look and identify the issue it will be greatly appreciated!

By the way, does anyone else have similar issue?
ie, the same code, using gfortran on windows it is several times slower than using gfortran on Ubuntu.

Thanks much!

You can try a trick:

1 Create a ramdisk
2 Move everything to the ramdisk and run your code there
3 Move everything back

It is trivial to create a ramdisk in linux . I have never done it in Windows, but it seems you can as well:

How to create a ramdisk in Windows 10

1 Like

Thanks @conradoat !
Eh, yeah, but I think it problem seems is not caused by the I/O.

I like the idea is ramdisk actually :slight_smile:.
I previous thought that if I install latex in ramdisk, and then compile latex files there, things will be lightning fast! But it turned out to be the speed of compiling latex on randisk is the same as compiling it on my ssd, lol. Because on ramdisk is file system is still NTFS.

Therefore I realize that Linux files system perhaps is much faster than windows’s NTFS. It seems no matter how fast or how high IOPS the SSD is, on windows file system it is just cannot reach its potential.

1 Like

@CRquantum you can try to gather together at most I/O ops in your code. That is something that slows down everything.

But I would advise you to avoid running on windows, if you can.

1 Like

Thanks @conradoat .
Eh, yeah, I have turned off I/O completely (the only write to a file, is just at the end when computation is done, it write to a very small file). The timing did not improved.

Now, the only I/O I have is just write(6,*) which is write to the screen.
Eh, are you suggesting turn off all the write(6,*)?

I am developing a package in Fortran and I hope it can be used on Win/Linux/Mac. Letting the user installing gfortran and Make on windows is a very convenient choice. Otherwise user need install Intel OneAPI which is also great but does not really optimize for Apple M1.

So I think if gfortran works fine on Win/Linux/Mac, it can be great.
But the issue of gfortran I have met on windows is that its performance is several times slower than on Linux.
Also, gfortran/gcc seems still some compatibility issue with M1 chip. For example, if use a pointer to point to a function/subroutine, gfortran seems can only work with -Og flag. With more than that optimization it will cause some error on M1 chip.

Blockquote Eh, are you suggesting turn off all the write(6,*) ?

Yes if you can. But from what you say, it seems that the I/O is not the problem.

Thanks @conradoat .
The IO or wirte(6,*) seems is really not the problem. Even if I turned them all off, timing is the same, on windows gofrtran took 3s, intel took 0.5s.

Have you looked at the assembly language produced? I strongly suspect that the answer is there.

@CRQuantum, I think that many of the statements and conclusions stated in the various posts in this thread are unfair to Gfortran. Not much has been said about which versions of Gfortran were used on Windows, and which emulation layer/DLL support infrastructure was used.

I am attempting to make amends.

A brief scan of your sources made me suspect that it uses 8-byte integers in many places where 4-byte integers would have sufficed. This choice, naturally, affects performance, but I did not want to spend any time to alter this aspect of the code.

I took your Gitlab sources, and commented out most of the WRITE statements, until the program produced just 16 lines of output. Here are the run times on my NUC (small box PC with low power laptop processor i7-10710U, laptop memory, on a ramdisk, balanced power setting, Win11-64).

Ifort 2021.5, /O2            : 0.55 s
-same-, but  /fast           : 0.38 s

Cygwin Gfortran 11.2, -O2    : 0.79 s
Eq.com Gfortran 12.0, -O2    : 2.76 s 

I strongly suspected that the difference in the Gfortran times is attributable to Cygwin using (naturally) Cygwin1.DLL rather than the MinGW DLLs. If so, most of the slowdown that you noticed is attributable not to the compiler but to the runtime (which is also used by GCC, G++, etc.). I ran a test to settle my suspicion. I used Cygwin Gfortran to produce .o files, and linked the .o files using the MinGW Gfortran. The resulting a.exe took 2.8 s. This proves that there can be drastic differences in run times of Fortran programs depending on which versions of the GCC RTL are used. We may note in this connection that MinGW was last updated in 2013, whereas Cygwin was updated just two months ago.

You also wrote that Gprof failed to give you profile output. Here is part of the output that I obtained from Cygwin Gprof:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 38.24      0.26     0.26 10198404     0.00     0.00  __samplers_MOD_pyq_i_o
 35.29      0.50     0.24                             _mcount_private
  8.82      0.56     0.06  3999800     0.00     0.00  __random_MOD_randn
  5.88      0.60     0.04                             __fentry__
  2.94      0.62     0.02      208     0.10     0.10  __random_MOD_gaussian
  2.94      0.64     0.02      100     0.20     0.50  __samplers_MOD_metroplis_gik_k_more_o_log
  2.94      0.66     0.02       51     0.39     5.87  __samplers_MOD_prep
  2.94      0.68     0.02                             exp

If you skip the line for _mcount, which is the profiling routine itself, you may note that the biggest time consumed is in function PYQ_I, and that function accounts for over a third of the run time.

You may attempt to modify your code to reduce the number of calls to PYQ_I, or to make it a vector function instead of a scalar, if that is feasible. Secondly, as I mentioned earlier, see if you can be more judicious in using 8-byte integer variables.

4 Likes

Optimize? Intel OneAPI Fortran compilers generate code for X86/X64 processors. They do not target ARM at all.

I have not used an Apple computer for years. From what I have read, Apple M computers provide an emulator/translator/converter “Rosetta-?” that allows running executables that target Apple’s older X64 computers.

1 Like

Thank you very much @mecej4 !
I totally agree with you. I do believe it is caused by some dll stuff in windows caused the problem. But I just do not know how to fix. Do you know how to fix that?

I am the only one in the place I work insisting using Fortran.
Before I came, they almost want to give up Fortran and switch to Julia (I have a Julia version of the code as well, but it is 3 times slower, on windows Julia costs 1.5s. But still better than gfortran’s 3s).
They told me that, after many years of wrestling with gfortran, they are a little tired of gfortran. Because different users have different versions gfortran, the same code sometimes work on some versions of gfortran, sometimes not. The performance of gfortran on windows is not consist with on Mac or Linux. You know, as a user, they do not care and do not know the problem is caused by DLL or gofrtran. All they know is that they installed gfortran, and compile and build the code, and the code is slow. So they will just blame gfortran.
I thought they have problems with gfortran, that is perhaps their code is not written good enough, and I never thought gfortran have any problems in any way.
I showed them fast algorithm using modern Fortran with intel OneAPI, now they are considering Fortran again. Especially because Intel’s OneAPI is free now and works fine on windows and Linux.
Again, I want to make code perform consistent on WIn/Linux/Mac, I see gfortran has the potential to be a very good lightweight choice for all the three platforms.
But the slowness problem currently I have with gfortran on windows, really and deeply bothers me very much.

The gfortran I am using is from equation.com, gfortran 11.2.0
Fortran, C, C++ for Windows

I see you use cygwin64 on windows 10, and that gfortran cost 0.79s, that is about right.
I wonder how did you achieve that?
I mean I installed cygwin64 and installed gfortran 11.2.0 there, then I compile and run my code there, but I still get slow speed like 2.9s,

The real problem is, how to make my code on windows, using gfortran, roughly the same speed as gfortran run on Ubuntu? Do you know how to fix that? What gfortran for windows should I use? I have installed all the possible gfortran version on windows, and it seems they just perform 6x slower on windows than on Linux.
I really hope there is a way to make gfortran’s speed is consistent on windows and on ubuntu.

Thank you very much indeed!

PS

Thank you for your suggestions of optimization. Eh, the integer 8 does not really matter I believe, on modern hardware.
On windows and ubuntu, no matter integer 4 or 8, Intel fortran gives the same speed 0.5s.
gfortran on Ubuntu also gives about 0.6s no matter integer 4 or 8, which is decent enough.
It is an illustration code and optimization is not very important but thank you very much all the same, I highly appreciate it!