Why gprof gives empty log file on Windows 10?

I have an issue with the gprof in the gfortran version from equation.com
Fortran, C, C++ for Windows
I tried gfortran 11.2.0 and the latest gfortran 12.0 there, the problems are the same.

The code is here,

It is from the thread below,

Thank @JohnCampbell very much for his help in writing a nice bat file to illustrate the usage of grof on Windows for the code.
I added a line cd /d "%~dp0" as discussed at,

on the top his bat file, so that when the bat file run with administrator permission, it will still always first locate to its folder and then run.

The bat file to do gprof is

cd /d "%~dp0"

del *.o
del *.mod
del em_mix.exe

set options=-g -fimplicit-none -fallow-argument-mismatch -march=native -pg

gfortran ran.f90 samplers.f90 em_mix.f90 %options% -o em_mix.exe

dir *.exe

em_mix

gprof -b -J -p em_mix.exe > em_mix_profile.log

notepad em_mix_profile.log

I use the equation.com version of gfortran. However the problem is that my em_mix_profile.log is always empty as below,

I placed the folders of equation.com version of gfortran (the gcc folders) on the top of the system variable path as below,

I wonder, guys, have you met such empty gprof log file issue before?

Do you know what might be the reason why the gprof on my WIndows give empty results? I ran the bat either with or without the administrator permission, the gprof log file are always empty.

Thank you very much in advance!

PS.

The cygwin64 version of gfortran works fine, with the code and the bat by @JohnCampbell, the gprof result is,

I ran the .bat file I provided in the previous post with gfortran 11.1 on Windows 10 and Ryzen 5900X and produced the following log:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 57.92      1.17     1.17                             exp
 24.75      1.67     0.50 10198404     0.00     0.00  __samplers_MOD_pyq_i_detail
  4.95      1.77     0.10                             __logl_internal
  1.98      1.81     0.04      100     0.40     0.55  __samplers_MOD_metroplis_gik_k_more_o_log
  1.98      1.85     0.04                             __sinl_internal
  1.49      1.88     0.03    10200     0.00     0.05  __samplers_MOD_mc_gauss_ptheta_w_sig
  1.49      1.91     0.03       50     0.60     0.70  __samplers_MOD_metroplis_gik_all_o_log
  1.49      1.94     0.03                             __fentry__
  0.99      1.96     0.02      208     0.10     0.10  __random_MOD_gaussian
  0.99      1.98     0.02                             log
  0.50      1.99     0.01  4000008     0.00     0.00  __random_MOD_randn
  0.50      2.00     0.01  1000000     0.00     0.00  __samplers_MOD_pyq_more_o_log
  0.50      2.01     0.01                             __cosl_internal
  0.50      2.02     0.01                             _mcount_private
  0.00      2.02     0.00  6040500     0.00     0.00  __random2_MOD_ran1
  0.00      2.02     0.00    10660     0.00     0.00  delta_sec_
  0.00      2.02     0.00      102     0.00     0.19  __samplers_MOD_gauss_thetas
  0.00      2.02     0.00       90     0.00     0.00  __samplers_MOD_corrchk_internal
  0.00      2.02     0.00       51     0.00    10.78  __samplers_MOD_prep
  0.00      2.02     0.00       50     0.00    12.58  __samplers_MOD_steptest
  0.00      2.02     0.00        9     0.00     0.00  __samplers_MOD_get_musigma
  0.00      2.02     0.00        1     0.00     0.00  __random_MOD_savern
  0.00      2.02     0.00        1     0.00     0.00  __random_MOD_setrn
  0.00      2.02     0.00        1     0.00     0.00  __samplers_MOD_get_datetime
  0.00      2.02     0.00        1     0.00     0.00  __samplers_MOD_get_musigma_maxll
  0.00      2.02     0.00        1     0.00     0.00  __samplers_MOD_push_yji
  0.00      2.02     0.00        1     0.00     0.00  __samplers_MOD_samplers_init

I generate a report. I can’t rerproduce your problem.
Did you delete gmon.out ?

I certainly don’t know why you would redirect as you indicated; cd /d “%~dp0”

Your Flat profile for cygwin64 is very different and indicates you are using function pyq_i_o (which does not use exp !!!) and NOT pyq_i_detail.
This is a very different test to what I am running so your claims about Win gfortran look questionable.

I certainly do not share your view that Ubuntu is much better than Windows. The reality should be they are similar for most things, but will definately have some differences.
My choice of windows relates to more to my work based around Microsoft Office.

I will try to link my test directory to this post.

No. The code, and the flag are exactly the same. All the difference you saw are simply caused by using equation.com version gfortran vs. cygwin64 gfortran. It is what it is.
In fact, you could install cygwin64 and check it yourself and then make conclusion. You could check @mecej4 reply as well,

This same code, cygwin64 took like 0.7-0.8s, while equation.com gfortran took 2.8-3s. So 4X difference.
However, for more complicated code, equation.com gfortran seems perform much better than cygwin64 version. That is why I claim the performance of Windows version gfortran are not consistent with each other. Again, you could check by yourself.

By the way, in the code, functoin pyq_i_detail and pyq_i_o are exactly the same. pyq_i_o is never used, so I do not know why cygwin64 show pyq_i_o in the report.

I never say Ubuntu is better than Windows, not to say ‘much better’ :sweat_smile:
I am just saying that based on my experience, gfortran on Linux performs noticeably faster than on Windows. Again, in fact, you could simply try to install Linux and run gfortran there. I believe it is very likely you will notice the performance difference.

I also want to develop some software using gfortran on windows, but again, the windows version gfortran did not perform as I expected, not to say that my application need MPI support. Fortunately, Intel OneAPI performs good enough.

Because my report are always empty, so I try to run the bat using administrator permission. If I do not add cd /d “%~dp0”, the cmd under administrator permission will locate at system32 folder which is wrong.

Yeah I know. That is why I am confused. Yes, I deleted gmon.out, etc. No matter what I try, it seems equation.com gfortran produce empty profile results :sweat_smile:

As far as I can tell, the problem is that the profile data in the gmon.out produced by Equation.com’s Gcc/GFortran is defective. This file is produced when an Exe built with -pg is run, but most of the bytes in it are null. The gmon.out files generated by this version of GCC/GFortran are many times larger than the gmon.out files generated by Exe-s built with Cygwin GCC/Gfortran.

I built and ran a test program using Cygwin GCC, then ran Equation.com’s gprof on the Cygwin Exe and gmon.out. I received a normal gprof profile output.

I then built using Equation.com tools, and ran Cygwin’s gprof on the output files. I received an empty profile.

1 Like

I performed the profile test using gfortran and gprof from equation.com’s version 11.1
The .bat file I used was:

del *.o
del *.mod
del em_mix.exe

set options=-g -fimplicit-none -fallow-argument-mismatch -march=native -pg

gfortran ran.f90 samplers.f90 em_mix.f90 %options% -o em_mix.exe

dir *.exe

em_mix

gprof -b -J -p em_mix.exe > em_mix_profile.log

notepad em_mix_profile.log
notepad em_mix.log

The gmon.out file is large but does produce an output file ?
This is the first time I have used gprof, but the results look reasonable.

I also made some modifications to the module samplers, to include monitoring of the EXP function:

    integer*8 :: calls_pYq_i_detail = 0, last_pYq_i_detail = 0
    real(kind=r8) :: huge_exp = log (huge(one))
    real(kind=r8) :: tiny_exp = log (tiny(one))
    integer(kind=i8) :: count_good = 0
    integer(kind=i8) :: count_bad  = 0

contains

    real(kind=r8) function exp_chk (val)
      real(kind=r8) :: val
      if ( abs (val) > huge_exp ) then
        count_bad = count_bad+1
        exp_chk = 0
      else
        count_good = count_good+1
        exp_chk = exp (val)
      end if
    end function exp_chk

    subroutine report_exp_chk
      write (*,*) 'Number of good exp calls =',count_good
      write (*,*) 'Number of bad  exp calls =',count_bad
    end subroutine report_exp_chk

	subroutine samplers_init(itermaxin,min_v,mgaussin,miin,nsubin,pin,kmixin &

By calling exp_chk in function function pYq_i_detail and calling report_exp_chk before STOP, I get the following report:

  calls to pYq_i_detail =             10198404
 Number of good exp calls =             60606774
 Number of bad  exp calls =               583650
 Program end normally.

I noted that the “Flat profile” I have produced reports times for intrinsic functions exp, log, sin & cos, while the profile produced by cygwin64 version of gfortran does not reference exp.
This may explain the the performance differences as there were no times for intrinsics in the cygwin64 profile.
Could it be that exp is treated in a more efficient way in the cygwin64 version ? ( which is the main difference between the exp libraries of the two versions.)

I also note there are 583,650 calls to exp where there will be overflow or underflow in the result.
This is a large number of calls where IEEE error handling may differ in performance.

There is definitely an indication the intrinsic EXP being treated differently in the two gfortran versions.
This test case EM_mix has approx 50% of elapse time in the exp function, which is very different to the test I use, not sufficient to extend to a general claim regarding windows.

I am attaching the 3 .f90 files and the 2 .bat files I have been using for testing.

EM_mix.f90 (11.6 KB)
make_gf_bat.f90 (239 Bytes)
make_prof_bat.f90 (307 Bytes)
ran.f90 (4.5 KB)
samplers.f90 (80.5 KB)

( the changes I have made were more extensive than minimal, as
I removed many of the tab characters which are/were not a part of the Fortran character set and
also
changed some layouts for my better understanding, such as double ’ with " in format statements.
I think the second argument to ISHFT could be a default integer ?
I am not recommending any of my formatting changes, but reduced use of tabs could assist portability)

1 Like

The following comments are about the various distributions of Gfortran/GCC from Equation-com and how they perform on my PC, which runs W11.

VERSION Comments
10.3 Produces profile output, 52% of time reported as spent in exp()
11.x Dead download links at Equation-com, can’t download and check
12.0 Empty profile output, as I reported earlier

1 Like

It should be a bug. A year ago, I found that gfortran and gprof of MSYS2 on Windows generated empty reports. This problem still exists (currently gfortran 11.3 still has this problem), while gfortran on WSL2 has no problem.

empty log
>> gprof -b ./main.exe gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name


                        Call graph


granularity: each sample hit covers 4 byte(s) no time propagated

index % time    self  children    called     name


Index by function name
1 Like

Thank you @JohnCampbell for your careful check.

ISHFT(I, SHIFT)

It says

ISHFT returns a value corresponding to I with all of the bits shifted SHIFT places. A value of SHIFT greater than zero corresponds to a left shift, a value of zero corresponds to no shift, and a value less than zero corresponds to a right shift. If the absolute value of SHIFT is greater than BIT_SIZE(I) , the value is undefined.

The second argument in ISHFT, whether it is integer or integer 8, it does not seem to matter, as long as absolute value of SHIFT is not greater than BIT_SIZE(I).

I tested using equation.com gfortran version 11.2.0 and 13-20220501, both give empty gprof results.
But you said you are using 11.1, and it seems this version works for gprof.

If you have big and complicated code, you may also check if there are big performance difference between the equation.com and the cygwin64 versions of gfortran.

About the speed issue, I may check,

https://www.kitware.com/fortran-for-cc-developers-made-easier-with-cmake/

See if gfortran can take advantage of Visual Studio in building exe which may then give the best performance on Windows. Visual Studio may be the tool to build applications for Windows, after all, they all belong to Microsoft.

Don’t bother, that would be a waste of time. We are talking about the performance of computational applications, not GUI programs.

For a given set of compiler options and a selected compiler, whether you build your EXE from an IDE or from a command window, whether you use Cmake or Make or your own batch file or manually typed commands, the run time of the EXE will be the same.

An IDE may hide voluminous console output and make the run a bit faster, but the same effect can be achieved by redirecting the output to a file or the null device from the command shell.

Thanks @mecej4 .
May I ask, again, then why on windows, intel fortran which uses Visual Studio to do build and link,and always give the same performance as on Linux?
I mean, I can be absolutely wrong, but I think I am almost sure that there must something in Visual Studio which is more than an IDE, and that is why Intel choose to use Visual Studio to build and link Fortran on Windows.
Therefore I suspect if gfortran can also uses Visual Studio to do build and link as intel did, then the code on windows may be the about same speed as Intel Fortran.
Again, thank you very much, and I can be completely wrong :sweat_smile:

Based on my experience, I would suggest that a reason Intel now uses Visual Studio on Windows is to manage the path settings for the compiler options and also the related .dll paths that are required to both link and run the .exe.
Prior to enforcing Visual Studio, getting the correct settings involved a complex .bat startup for a cmd.exe window.

On Windows, achieving compatible path and other environment variable settings is important for best operation of any Fortran compiler, including gfortran.
When having multiple compilers installed, it is essential to include all necessary paths for the compiler and also exclude paths for other compilers that may be installed on the PC.

Please go back to the starting post of this thread and look at the screenshot that you posted – the one that contains the gprof-Cygwin flat profile for your program. Let us assume that we can trust the profile to be authentic as well as representative of the kinds of programs that interest you.

Your program is compute bound, not I/O or network bound, or dependent on user interaction. The top seven lines account for 99.9 percent of the time. Except for the 18.64 % used up in the GCC mcount routine, the remaining 81 % is spent in your code – pyq_i(), etc. There is no place where you can usefully stick in Visual Studio or any other propellant into your EXE, as long as you use Gfortran to compile your sources and produce object code that gets linked into an EXE. The only way to produce a faster EXE is to use a compiler that generates faster object code for pyq_i(), etc. A compiler, please note. Not an IDE or build system or whiz-bang library. The best you could do, without changing the compiler, is to get a whiz-bang replacement for mcount() that consumes zero time. If you did so, your code would run 25 % faster. Turning off profiling would also make it slightly faster.

On the other hand, if you used a highly optimizing compiler such as Ifort, or a new version of Gfortran containing a new optimizer that you contributed, and then chose the right options for speed, the object code it generates for pyq_i(), etc., could run much faster, possibly twice as fast.

If you favor speculation and wishful thinking, what I wrote will not sway you.