Using gprof with gfortran -- no program output

I am using gprof on Windows 10 on a toy Fortran program compiled with GNU Fortran (GCC) 11.0.0 20200927 from equation.com . It runs and times the program, but the output of the program is suppressed. For the code

module stats_mod
implicit none
private
public :: rms,mean
contains
function rms(x) result(rms_x)
real, intent(in) :: x(:)
real             :: rms_x
rms_x = sqrt(mean(x**2))
end function rms
!
function mean(x) result(xmean)
real, intent(in) :: x(:)
real             :: xmean
xmean = sum(x)/max(1,size(x))
end function mean
end module stats_mod
!
program xran_stats
use stats_mod, only: rms,mean
implicit none
integer, parameter :: n = 10000000, niter = 100
real               :: x(n),xrms(niter)
integer            :: iter
do iter=1,niter
   call random_number(x)
   xrms(iter) = rms(x)
end do
write (6,*) "results: RMS mean, min, max =",sum(xrms)/niter,minval(xrms),maxval(xrms)
end program xran_stats

compiled and run with

gfortran -pg xran_stats.f90
gprof a.exe

The results of the program are not printed to the screen, although the profile information is shown:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 36.93      0.89     0.89                             _gfortran_arandom_r4
 31.95      1.66     0.77      101     7.62    15.05  __stats_mod_MOD_rms
 31.12      2.41     0.75      100     7.50     7.50  __stats_mod_MOD_mean

<snip>

		     Call graph (explanation follows)


granularity: each sample hit covers 4 byte(s) for 0.41% of 2.41 seconds

index % time    self  children    called     name
                0.01    0.01       1/101         main [5]
                0.76    0.74     100/101         MAIN__ [2]
[1]     63.1    0.77    0.75     101         __stats_mod_MOD_rms [1]
                0.75    0.00     100/100         __stats_mod_MOD_mean [4]
-----------------------------------------------
                                                 <spontaneous>
[2]     62.4    0.00    1.50                 MAIN__ [2]
                0.76    0.74     100/101         __stats_mod_MOD_rms [1]
-----------------------------------------------
                                                 <spontaneous>
[3]     36.9    0.89    0.00                 _gfortran_arandom_r4 [3]
-----------------------------------------------
                0.75    0.00     100/100         __stats_mod_MOD_rms [1]
[4]     31.1    0.75    0.00     100         __stats_mod_MOD_mean [4]
-----------------------------------------------
                                                 <spontaneous>
[5]      0.6    0.00    0.02                 main [5]
                0.01    0.01       1/101         __stats_mod_MOD_rms [1]
-----------------------------------------------

<snip>

Index by function name

   [4] __stats_mod_MOD_mean    [1] __stats_mod_MOD_rms     [3] _gfortran_arandom_r4

If I change write (6,*) to write (10,*) in my code, the file fort.10 is not written to, but if I just run.exe it is created with contents such as

results: RMS mean, min, max = 0.574540854 0.574369133 0.574764371

So it looks like program run with prof can be used for timing but not to generate program results, which is inconvenient. I get the same behavior running gfortran and gprof under Windows Subsystem for Linux.

1 Like

Under Linux I must do:

 $  gfortran -pg xran_stats.f90
 $  ./a.out
 $  gprof a.out

The second step is necessary to generate the gmon.out file used by gprof. So the second step prints the results. And gprof only prints gprof stuff.

1 Like

The same procedure works for Windows – thanks.

1 Like

For the code

module stats_mod
implicit none
private
public :: rms,mean
contains
function rms(x) result(rms_x)
real, intent(in) :: x(:)
real             :: rms_x
rms_x = sqrt(mean(x**2))
end function rms
!
function mean(x) result(xmean)
real, intent(in) :: x(:)
real             :: xmean
xmean = sum(x)/max(1,size(x))
end function mean
end module stats_mod
!
program xran_stats
use stats_mod, only: rms,mean
implicit none
integer, parameter :: n = 10000000, niter = 200
real               :: x(n),xrms(niter),t1,t2
integer            :: iter
call cpu_time(t1)
do iter=1,niter
   call random_number(x)
   xrms(iter) = rms(x)
end do
write (*,*) "results: RMS mean, min, max =",sum(xrms)/niter,minval(xrms),maxval(xrms)
call cpu_time(t2)
write (*,*) "time elapsed =",t2-t1
end program xran_stats

the elapsed time is 8.0s, but gprof reports execution time of 4.8s.

Flat profile:

    Each sample counts as 0.01 seconds.
      %   cumulative   self              self     total           
     time   seconds   seconds    calls  ms/call  ms/call  name    
     37.63      1.81     1.81                             _gfortran_arandom_r4
     35.14      3.50     1.69      201     8.41    14.93  __stats_mod_MOD_rms
     27.23      4.81     1.31      200     6.55     6.55  __stats_mod_MOD_mean

I find a similar underestimate in my production program. A Stack Overflow thread explains why gprof does this. Can one assume that reported execution times for procedures are proportional to the actual times?

On a Linux machine:

vmagnin@PC2:~$ time ./a.out  
 results: RMS mean, min, max =  0.574535549      0.574328899      0.574711323    
 time elapsed =   11.3187618    

real    0m11,328s
user    0m9,032s
sys     0m2,290s
vmagnin@PC2:~$ gprof a.out   
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 70.38      5.07     5.07      200     0.03     0.04  __stats_mod_MOD_rms
 30.16      7.24     2.17      200     0.01     0.01  __stats_mod_MOD_mean
  0.00      7.24     0.00        1     0.00     7.24  MAIN__

I don’t know why there is a difference between 7.24s and 9.032s, but I guess that the sys 0m2.290s comes from the allocation/deallocation of memory for the arrays in the functions. An interesting command is:

$ strace ./a.out

You will see the syscalls:

--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 1
munmap(0x7f489b0a3000, 40001536)        = 0
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 3444257536
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 693075200
mmap(NULL, 40001536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f489b0a3000
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 993276
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---

We can see the SIGPROF signal they talk about in StackOverflow, and mmap() and munmap() syscalls: What is `mmap` in C?

Of course, arrays are passed by reference to the functions, but I guess the mmap() syscalls concerns arrays needed to make the intermediate calculations. Did you agree ?

1 Like

The sys 0m2.290s are probably due to both SIGPROF and mmap() / munmap(), but I don’t know in which proportions.
After all, your niter is only 200. OK, the arrays are quite big (40 MB), but I don’t know the algorithms behind mmap(). Reserving a block of memory seems not to take a lot of time? But I am personally already beyond my knowledge…