I performed the profile test using gfortran and gprof from equation.comâs version 11.1
The .bat file I used was:
del *.o
del *.mod
del em_mix.exe
set options=-g -fimplicit-none -fallow-argument-mismatch -march=native -pg
gfortran ran.f90 samplers.f90 em_mix.f90 %options% -o em_mix.exe
dir *.exe
em_mix
gprof -b -J -p em_mix.exe > em_mix_profile.log
notepad em_mix_profile.log
notepad em_mix.log
The gmon.out file is large but does produce an output file ?
This is the first time I have used gprof, but the results look reasonable.
I also made some modifications to the module samplers, to include monitoring of the EXP function:
integer*8 :: calls_pYq_i_detail = 0, last_pYq_i_detail = 0
real(kind=r8) :: huge_exp = log (huge(one))
real(kind=r8) :: tiny_exp = log (tiny(one))
integer(kind=i8) :: count_good = 0
integer(kind=i8) :: count_bad = 0
contains
real(kind=r8) function exp_chk (val)
real(kind=r8) :: val
if ( abs (val) > huge_exp ) then
count_bad = count_bad+1
exp_chk = 0
else
count_good = count_good+1
exp_chk = exp (val)
end if
end function exp_chk
subroutine report_exp_chk
write (*,*) 'Number of good exp calls =',count_good
write (*,*) 'Number of bad exp calls =',count_bad
end subroutine report_exp_chk
subroutine samplers_init(itermaxin,min_v,mgaussin,miin,nsubin,pin,kmixin &
By calling exp_chk in function function pYq_i_detail and calling report_exp_chk before STOP, I get the following report:
calls to pYq_i_detail = 10198404
Number of good exp calls = 60606774
Number of bad exp calls = 583650
Program end normally.
I noted that the âFlat profileâ I have produced reports times for intrinsic functions exp, log, sin & cos, while the profile produced by cygwin64 version of gfortran does not reference exp.
This may explain the the performance differences as there were no times for intrinsics in the cygwin64 profile.
Could it be that exp is treated in a more efficient way in the cygwin64 version ? ( which is the main difference between the exp libraries of the two versions.)
I also note there are 583,650 calls to exp where there will be overflow or underflow in the result.
This is a large number of calls where IEEE error handling may differ in performance.
There is definitely an indication the intrinsic EXP being treated differently in the two gfortran versions.
This test case EM_mix has approx 50% of elapse time in the exp function, which is very different to the test I use, not sufficient to extend to a general claim regarding windows.
I am attaching the 3 .f90 files and the 2 .bat files I have been using for testing.
EM_mix.f90 (11.6 KB)
make_gf_bat.f90 (239 Bytes)
make_prof_bat.f90 (307 Bytes)
ran.f90 (4.5 KB)
samplers.f90 (80.5 KB)
( the changes I have made were more extensive than minimal, as
I removed many of the tab characters which are/were not a part of the Fortran character set and
also
changed some layouts for my better understanding, such as double â with " in format statements.
I think the second argument to ISHFT could be a default integer ?
I am not recommending any of my formatting changes, but reduced use of tabs could assist portability)