Using gprof with gfortran -- no program output

Beliavsky · February 7, 2021, 8:40pm

I am using gprof on Windows 10 on a toy Fortran program compiled with GNU Fortran (GCC) 11.0.0 20200927 from equation.com . It runs and times the program, but the output of the program is suppressed. For the code

module stats_mod
implicit none
private
public :: rms,mean
contains
function rms(x) result(rms_x)
real, intent(in) :: x(:)
real             :: rms_x
rms_x = sqrt(mean(x**2))
end function rms
!
function mean(x) result(xmean)
real, intent(in) :: x(:)
real             :: xmean
xmean = sum(x)/max(1,size(x))
end function mean
end module stats_mod
!
program xran_stats
use stats_mod, only: rms,mean
implicit none
integer, parameter :: n = 10000000, niter = 100
real               :: x(n),xrms(niter)
integer            :: iter
do iter=1,niter
   call random_number(x)
   xrms(iter) = rms(x)
end do
write (6,*) "results: RMS mean, min, max =",sum(xrms)/niter,minval(xrms),maxval(xrms)
end program xran_stats

compiled and run with

gfortran -pg xran_stats.f90
gprof a.exe

The results of the program are not printed to the screen, although the profile information is shown:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 36.93      0.89     0.89                             _gfortran_arandom_r4
 31.95      1.66     0.77      101     7.62    15.05  __stats_mod_MOD_rms
 31.12      2.41     0.75      100     7.50     7.50  __stats_mod_MOD_mean

<snip>

		     Call graph (explanation follows)


granularity: each sample hit covers 4 byte(s) for 0.41% of 2.41 seconds

index % time    self  children    called     name
                0.01    0.01       1/101         main [5]
                0.76    0.74     100/101         MAIN__ [2]
[1]     63.1    0.77    0.75     101         __stats_mod_MOD_rms [1]
                0.75    0.00     100/100         __stats_mod_MOD_mean [4]
-----------------------------------------------
                                                 <spontaneous>
[2]     62.4    0.00    1.50                 MAIN__ [2]
                0.76    0.74     100/101         __stats_mod_MOD_rms [1]
-----------------------------------------------
                                                 <spontaneous>
[3]     36.9    0.89    0.00                 _gfortran_arandom_r4 [3]
-----------------------------------------------
                0.75    0.00     100/100         __stats_mod_MOD_rms [1]
[4]     31.1    0.75    0.00     100         __stats_mod_MOD_mean [4]
-----------------------------------------------
                                                 <spontaneous>
[5]      0.6    0.00    0.02                 main [5]
                0.01    0.01       1/101         __stats_mod_MOD_rms [1]
-----------------------------------------------

<snip>

Index by function name

   [4] __stats_mod_MOD_mean    [1] __stats_mod_MOD_rms     [3] _gfortran_arandom_r4

If I change write (6,*) to write (10,*) in my code, the file fort.10 is not written to, but if I just run.exe it is created with contents such as

results: RMS mean, min, max = 0.574540854 0.574369133 0.574764371

So it looks like program run with prof can be used for timing but not to generate program results, which is inconvenient. I get the same behavior running gfortran and gprof under Windows Subsystem for Linux.

vmagnin · February 7, 2021, 9:04pm

Under Linux I must do:

 $  gfortran -pg xran_stats.f90
 $  ./a.out
 $  gprof a.out

The second step is necessary to generate the gmon.out file used by gprof. So the second step prints the results. And gprof only prints gprof stuff.

Beliavsky · February 7, 2021, 9:44pm

The same procedure works for Windows – thanks.

Beliavsky · February 7, 2021, 11:28pm

For the code

module stats_mod
implicit none
private
public :: rms,mean
contains
function rms(x) result(rms_x)
real, intent(in) :: x(:)
real             :: rms_x
rms_x = sqrt(mean(x**2))
end function rms
!
function mean(x) result(xmean)
real, intent(in) :: x(:)
real             :: xmean
xmean = sum(x)/max(1,size(x))
end function mean
end module stats_mod
!
program xran_stats
use stats_mod, only: rms,mean
implicit none
integer, parameter :: n = 10000000, niter = 200
real               :: x(n),xrms(niter),t1,t2
integer            :: iter
call cpu_time(t1)
do iter=1,niter
   call random_number(x)
   xrms(iter) = rms(x)
end do
write (*,*) "results: RMS mean, min, max =",sum(xrms)/niter,minval(xrms),maxval(xrms)
call cpu_time(t2)
write (*,*) "time elapsed =",t2-t1
end program xran_stats

the elapsed time is 8.0s, but gprof reports execution time of 4.8s.

Flat profile:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  ms/call  ms/call  name    
     37.63      1.81     1.81                             _gfortran_arandom_r4
     35.14      3.50     1.69      201     8.41    14.93  __stats_mod_MOD_rms
     27.23      4.81     1.31      200     6.55     6.55  __stats_mod_MOD_mean

I find a similar underestimate in my production program. A Stack Overflow thread explains why gprof does this. Can one assume that reported execution times for procedures are proportional to the actual times?

vmagnin · February 8, 2021, 8:11am

On a Linux machine:

vmagnin@PC2:~$ time ./a.out  
 results: RMS mean, min, max =  0.574535549      0.574328899      0.574711323    
 time elapsed =   11.3187618    

real    0m11,328s
user    0m9,032s
sys     0m2,290s
vmagnin@PC2:~$ gprof a.out   
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 70.38      5.07     5.07      200     0.03     0.04  __stats_mod_MOD_rms
 30.16      7.24     2.17      200     0.01     0.01  __stats_mod_MOD_mean
  0.00      7.24     0.00        1     0.00     7.24  MAIN__

I don’t know why there is a difference between 7.24s and 9.032s, but I guess that the sys 0m2.290s comes from the allocation/deallocation of memory for the arrays in the functions. An interesting command is:

$ strace ./a.out

You will see the syscalls:

--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 1
munmap(0x7f489b0a3000, 40001536)        = 0
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 3444257536
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 693075200
mmap(NULL, 40001536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f489b0a3000
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 993276
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---

We can see the SIGPROF signal they talk about in StackOverflow, and mmap() and munmap() syscalls: What is `mmap` in C?

Of course, arrays are passed by reference to the functions, but I guess the mmap() syscalls concerns arrays needed to make the intermediate calculations. Did you agree ?

vmagnin · February 8, 2021, 2:49pm

The sys 0m2.290s are probably due to both SIGPROF and mmap() / munmap(), but I don’t know in which proportions.
After all, your niter is only 200. OK, the arrays are quite big (40 MB), but I don’t know the algorithms behind mmap(). Reserving a block of memory seems not to take a lot of time? But I am personally already beyond my knowledge…

Topic		Replies	Views
Why gprof gives empty log file on Windows 10?	12	1843	May 6, 2022
Profiling gfortran code Help	6	434	March 15, 2024
Why is my code compiled with GFortran on Windows slower than on Ubuntu?	51	5686	May 3, 2022
Are there really any reliable ways to time Fortran code?	18	916	September 27, 2024
How to show the true wall time instead of cpu_time?	29	8505	October 29, 2023

Using gprof with gfortran -- no program output

Related topics