I looked at the Fortran code and noticed that the author does not use the d0 suffix for double precision real literals. Not sure if that affects correctness and performance.
I donâ€™t see why it will affect performance nor correctness; the compiler will transform an integer or real which was meant to be double precision or whatever else as needed. However, I see it as a bad programming practice (it is extremely common in C, by the way). It will also cause tons of warnings if you have the -Wconversion
or -Wconversion-extra
compilation flags enabled (which I always have). Or similar flags in compilers other than gfortran.
I have several times written example programs to demonstrate that this can affect answers. Usually it doesnâ€™t impact things in a significant way, but if you are using an algorithm that is highly sensitive it could.
The issue is that the conversion to double precision doesnâ€™t happen before the conversion to floating point representation. Thus, if the number is not exactly representable as a 32 bit floating point number, it will be â€śroundedâ€ť to the nearest 32 bit value and then converted to a 64 bit value. If you specify the suffix, then the â€śroundingâ€ť will be directly to the nearest 64 bit value.
Like I said, most of the time the difference is inconsequential, but it is there.
program main
implicit none
real(8), parameter :: tolerance1 = 0.0000001
real(8), parameter :: tolerance2 = 0.0000001d0
print '(F20.16)', tolerance1
print '(F20.16)', tolerance2
end program main
Using fpm run (GFortran 11.2.0):
0.0000001000000012
0.0000001000000000
The 15 different ways to declare real variables in fortran is a HUGE source of errors and confusion. This includes using singles and thinking you are getting doubles.
If it was up to me Fortran would depreciate all but one way: real(wp)
and x = 1.0_wp
, where wp
(or whatever you want to name it) must be declared. Away with 1.0
, 1.0d0
, real(8)
, double precision
, real*8
, etc. etc.
And yes, I know this will never happen.
Itâ€™s convenient to write
integer, parameter :: dp = kind(1.0d0)
and easier to remember and less intimidating than
integer, parameter :: dp = selected_real_kind(15, 307)
It would be irregular to prohibit a literal such as 8
where a named constant can appear. Regarding real*8
, the committee cannot remove from the language what was never put in.
real(8) :: x
x = 5d-1
This seems very straightforward and convenient to me. I do agree that real*8 and double precision should be discouraged.
But real(8)
is unfortunately not portable:
âťŻ nagfor test.f90
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7104
Questionable: test.f90, line 3: Variable X set but never referenced
Error: test.f90, line 1: KIND value (8) does not specify a valid representation method
Errors in declarations, no further processing for $main$
[NAG Fortran Compiler error termination, 1 error, 1 warning]
I am aware of that. In my area, people only use GFortran. Most of the Fortran tagged questions at Stack Overflow also use GFortran. So thatâ€™s not a problem for me.
Yep Iâ€™m well aware of it. The vendors and the committee have always prioritized the legacy codes over trying to get new people to write Fortran. In my view, itâ€™s one of the main reasons for Fortranâ€™s decline.
If it were harder for people new to Fortran to come across real*8/real(8) , then they wouldnâ€™t know that the compiler even accepts such usage. Maybe if we clean up the legacy codes, people will learn from good practice and not repeat archaic forms.
I agree with this point, and I believe this issue can be solved with better documentation. This paper and its author are highly regarded in computational economics. Even he misuses double precision real literals, no wonder so many others make this mistake.
At least itâ€™s not C++â€¦
Wish it was a little more specific about the compiler options used and so on. Since the code
was so short it was easy enough to get some numbers. On a very modest Linux laptop old enough to be the same vintage as the report, using a â€śbeforeâ€ť and â€śafterâ€ť version of the code making no other changes running each 100 times I got
100 runs
Average (secs) | Shortest | Longest
Before 1.102803080400001 | 1.08397698 | 1.24383700
After 1.1099996502 | 1.09448707 | 1.21478796
with gfortran, which is quite a bit different than the 1.6 seconds in the article, but it was a very
different platform (a MSWindows box).
so any performance difference is small. I would have to change the code or instrument it to determine it any better, but there is some chance it could be up to a 1% but I doubt it.
It does produce slightly different answers, as can be expected from the above discussion
I just used fpm(1) and copied the file over app/main.f90 and fired it up using the default debug release and then --profile release. The results shown are for --profile release. The interesting part is the two profiles ran significantly different speeds (debug mode was 1.9 seconds, where the release mode was 1.1 seconds). So not sure what compiler options were used, and the report saying ifort speed changed by something like 7 or 9 percent could be because multiple runs were not averaged and so on.
Note the test is so small, that simple changes like running to the screen versus writing to an unbuffered file give around a five percent change on my platform, so I do not think my results change the general conclusions of the paper significantly, it just points out there might be some questions about the small changes compared to the previous results and such.
The only code change was to define a â€śdpâ€ť type and change all the constants to â€śdpâ€ť and the real(8) to real(dp).
PS: There is an odd transpose of an array and then a repeated transpose back that looked odd, but I did not look into it but it caught my eye as I did not notice the array as being changed anywhere and the way it was repeated would slow down an interpreted language.
Itâ€™s even more convenient to write
integer, parameter :: dp = kind(1d0)
If the link to Fortran code provided upthread is truly reflective of their performance data in the first paper, the results mentioned in the paper for Fortran paper are unreliable. Itâ€™s rather poor instrumentation of the code, itâ€™s absolutely at a beginner level. Immediately the lack of the following is glaring:
- in situ the code can be instrumented to run a certain number of trials, say NTRIALS where it is a reasonable number such as a dozen. Discarding then the fastest and the slowest times, the CPU usage then be marked as the median of the rest. Then repeat the same with certain number of execution attempts, say NRUNS, and get the overall median,
- use
SYSTEM_CLOCK
to get the tick counts using an integer type with a range of at least 10 (colloquially a 64-bit integer or better) rather thanCPU_TIME
to measure timing, - separate in code with suitable instrumentation the measurement of strictly numerical computing instructions from those involving output (and input) involving
PRINT
(andREAD
) statements. And report them distinctly. If the times toward IO are on the same order of magnitude as numerical operations, there is little to no reason for those keen on Fortran to pay much attention to such studies, the IO time will not be that competitive; besides, that will be of little practical interest any way.
It would be worth experimenting with a BLAS call to ?gemm
for the operation:
expectedValueFunction = matmul(mValueFunction,transpose(mTransition))
I also agree with @FortranFanâ€™s observation to separate the I/O part. Currently, the printing is performed quite frequently, every 10 iterations.
The textbook Introduction to Computational Economics using Fortran also uses real*8 to declare double precision real numbers.
https://www.ce-fortran.com/source-codes/
Again, the changes do not immediately change the general conclusions, but
it seems like a good place to show how
- the warning flags of the compiler should be used to their fullest
- a few compiler options can affect speed significantly,
- the often overlooked gprof(1) and gcov(1) tools can tell you a
lot about how your code is used and where it spends time with
very little effort.
So for users with the GCC programs gfortran, gprof, and gcov you should
give something like this a try if you are interested in information for
improving performance or testing coverage â€¦
#!/bin/bash
(
exec 2>&1
# do everything in a scratch directory
SCR=scr_$(uuidgen)
mkdir -p $SCR
cp econ1.f90 $SCR/
cd $SCR
gfortran -g -pg -fprofile-arcs -ftest-coverage -O0 econ1.f90 -o econ1
./econ1
# better when you have a lot of procedures
gprof --brief econ1 gmon.out
# better for short programs
gprof --brief --line econ1 gmon.out
# coverage report
gcov --stdout econ1.f90
) >reports.out
exit
and simply compiling with -Warn -Wextra identified a few easily corrected
issues. Always a good practice to use all those optional warnings!
And I was off a bit on my guess about performance and using 64-bit
constants instead of 32-bit. Using a more rigorious timing test and 1 000,
executions it does slow it down on my platform 1.36 percent. I am still
a bit surprised, as almost all the usage of 32-bit values was combined
with 64-bit values.
By the way, for fun I built it using 128-bit values and got about the
same answer but it was 45 times slower (with gfortran). I expected a
slowdown but not of that order either. Live and learn.
So just using those example commands you can tell what lines of your
code were executed via gcov(1) (in this trivial case not so interesting
except for the counts but if building a test suite that is good info.).
And you can tell that three lines account for over 1/2 of the CPU time
consumed.
In this case the procedure report is essentially null, but in a big
program it quickly can identify which procedures are consuming the
CPU time.
gprof(1) and gcov(1) have their limitations but are often very useful. You
should read up on them and similar tools if this is interesting.
And as a footnote using unrolling and a few other switches on my platform
took ifort(1) to the top of the heap at 1`04``.
reports
program output Steady State values
Output: 0.56273143387317726 Capital: 0.17819828739139082 Consumption: 0.38453314648178644
Iteration: 1 Sup Diff: 5.2741593406882385E-002
Iteration: 10 Sup Diff: 3.1346949265582680E-002
Iteration: 20 Sup Diff: 1.8703459893196328E-002
Iteration: 30 Sup Diff: 1.1165512033874614E-002
Iteration: 40 Sup Diff: 6.6685417080749598E-003
Iteration: 50 Sup Diff: 3.9842927486826163E-003
Iteration: 60 Sup Diff: 2.3813118039122116E-003
Iteration: 70 Sup Diff: 1.4236586450859789E-003
Iteration: 80 Sup Diff: 8.5133977471318900E-004
Iteration: 90 Sup Diff: 5.0920517522479170E-004
Iteration: 100 Sup Diff: 3.0462324421209885E-004
Iteration: 110 Sup Diff: 1.8226485632144573E-004
Iteration: 120 Sup Diff: 1.0906950872535681E-004
Iteration: 130 Sup Diff: 6.5276437362649098E-005
Iteration: 140 Sup Diff: 3.9071082119535028E-005
Iteration: 150 Sup Diff: 2.3388077119768091E-005
Iteration: 160 Sup Diff: 1.4008644636964718E-005
Iteration: 170 Sup Diff: 8.3913172028715621E-006
Iteration: 180 Sup Diff: 5.0264745379280384E-006
Iteration: 190 Sup Diff: 3.0108998635425266E-006
Iteration: 200 Sup Diff: 1.8035522479920019E-006
Iteration: 210 Sup Diff: 1.0803409159487742E-006
Iteration: 220 Sup Diff: 6.4713169423136208E-007
Iteration: 230 Sup Diff: 3.8763619381043668E-007
Iteration: 240 Sup Diff: 2.3219657918627234E-007
Iteration: 250 Sup Diff: 1.3908720930544405E-007
Iteration: 257 Sup Diff: 9.7160356538061876E-008
My check: 0.14654914369569541
Elapsed time is 1.73320293
gprof by procedure
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
100.10 0.75 0.75 1 750.73 750.73 MAIN__
Call graph
granularity: each sample hit covers 2 byte(s) for 1.33% of 0.75 seconds
index % time self children called name
0.75 0.00 1/1 main [2]
[1] 100.0 0.75 0.00 1 MAIN__ [1]
-----------------------------------------------
<spontaneous>
[2] 100.0 0.00 0.75 main [2]
0.75 0.00 1/1 MAIN__ [1]
-----------------------------------------------
Index by function name
[1] MAIN__ (econ1.f90)
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
28.69 0.22 0.22 MAIN__ (econ1.f90:75 @ 2d76)
21.35 0.38 0.16 MAIN__ (econ1.f90:76 @ 2de1)
16.02 0.50 0.12 MAIN__ (econ1.f90:74 @ 2d22)
10.68 0.58 0.08 MAIN__ (econ1.f90:88 @ 2eb8)
4.67 0.61 0.04 MAIN__ (econ1.f90:73 @ 2cf5)
4.00 0.64 0.03 MAIN__ (econ1.f90:85 @ 2e71)
2.67 0.66 0.02 MAIN__ (econ1.f90:71 @ 2cba)
2.67 0.68 0.02 MAIN__ (econ1.f90:73 @ 2e21)
2.67 0.70 0.02 MAIN__ (econ1.f90:71 @ 2ea4)
2.00 0.72 0.02 MAIN__ (econ1.f90:77 @ 2def)
2.00 0.73 0.02 MAIN__ (econ1.f90:84 @ 2e3e)
1.33 0.74 0.01 MAIN__ (econ1.f90:78 @ 2df9)
0.67 0.75 0.01 MAIN__ (econ1.f90:81 @ 2ded)
0.67 0.75 0.01 MAIN__ (econ1.f90:76 @ 2e2a)
0.00 0.75 0.00 1 0.00 0.00 MAIN__ (econ1.f90:6 @ 2429)
Call graph
granularity: each sample hit covers 2 byte(s) for 1.33% of 0.75 seconds
index % time self children called name
0.00 0.00 1/1 main (econ1.f90:104 @ 3680) [70]
[15] 0.0 0.00 0.00 1 MAIN__ (econ1.f90:6 @ 2429) [15]
-----------------------------------------------
Index by function name
[15] MAIN__ (econ1.f90:6 @ 2429) [2] MAIN__ (econ1.f90:76 @ 2de1) [14] MAIN__ (econ1.f90:76 @ 2e2a)
[7] MAIN__ (econ1.f90:71 @ 2cba) [13] MAIN__ (econ1.f90:81 @ 2ded) [11] MAIN__ (econ1.f90:84 @ 2e3e)
[5] MAIN__ (econ1.f90:73 @ 2cf5) [10] MAIN__ (econ1.f90:77 @ 2def) [6] MAIN__ (econ1.f90:85 @ 2e71)
[3] MAIN__ (econ1.f90:74 @ 2d22) [12] MAIN__ (econ1.f90:78 @ 2df9) [9] MAIN__ (econ1.f90:71 @ 2ea4)
[1] MAIN__ (econ1.f90:75 @ 2d76) [8] MAIN__ (econ1.f90:73 @ 2e21) [4] MAIN__ (econ1.f90:88 @ 2eb8)
coverage report. Note the lines are numbered and match up to the gprof lines
econ1.gcno:version 'B03*', prefer 'A93*'
econ1.gcda:version 'B03*', prefer version 'A93*'
-: 0:Source:econ1.f90
-: 0:Graph:econ1.gcno
-: 0:Data:econ1.gcda
-: 0:Runs:1
-: 1:!============================================================================
-: 2:! Name : RBC_F90.f90
-: 3:! Description : Basic RBC model with full depreciation
-: 4:! Date : July 21, 2013
-: 5:!============================================================================
1: 6:program RBC_F90
-: 7:!----------------------------------------------------------------
-: 8:! 0. variables to be defined
-: 9:!----------------------------------------------------------------
-: 10:implicit none
-: 11:integer, parameter :: dp=kind(0.0d0)
-: 12:integer, parameter :: nGridCapital = 17820
-: 13:integer, parameter :: nGridProductivity = 5
-: 14:real(dp), parameter :: tolerance = 0.0000001_dp
-: 15:real :: time_begin, time_end
-: 16:integer :: nCapital, nCapitalNextPeriod, gridCapitalNextPeriod, nProductivity
-: 17:integer :: iteration
-: 18:real(dp) :: aalpha, bbeta, capitalSteadyState, outputSteadyState, consumptionSteadyState
-: 19:real(dp) :: valueHighSoFar, valueProvisional, consumption, capitalChoice
-: 20:real(dp) :: maxDifference
-: 21:real(dp), dimension(nGridProductivity) :: vProductivity
-: 22:real(dp), dimension(nGridProductivity,nGridProductivity) :: mTransition
-: 23:real(dp), dimension(nGridCapital) :: vGridCapital
-: 24:real(dp), dimension(nGridCapital,nGridProductivity) :: mOutput, mValueFunction, mValueFunctionNew, mPolicyFunction
-: 25:real(dp), dimension(nGridCapital,nGridProductivity) :: expectedValueFunction
1: 26: call CPU_TIME(time_begin)
-: 27: !----------------------------------------------------------------
-: 28: ! 1. Calibration
-: 29: !----------------------------------------------------------------
1: 30: aalpha = 0.33333333333_dp; ! Elasticity of output w.r.t
1: 31: bbeta = 0.95_dp ! Discount factor
-: 32: ! Productivity value
1: 33: vProductivity = (/0.9792_dp, 0.9896_dp, 1.0000_dp, 1.0106_dp, 1.0212_dp/)
-: 34: ! Transition matrix
-: 35: mTransition = reshape( (/ &
-: 36: 0.9727_dp, 0.0273_dp, 0._dp, 0._dp, 0._dp, &
-: 37: 0.0041_dp, 0.9806_dp, 0.0153_dp, 0.0_dp, 0.0_dp, &
-: 38: 0.0_dp, 0.0082_dp, 0.9837_dp, 0.0082_dp, 0.0_dp, &
-: 39: 0.0_dp, 0.0_dp, 0.0153_dp, 0.9806_dp, 0.0041_dp, &
1: 40: 0.0_dp, 0.0_dp, 0.0_dp, 0.0273_dp, 0.9727_dp /), (/5,5/))
-: 41: !----------------------------------------------------------------
-: 42: ! 2. Steady state
-: 43: !----------------------------------------------------------------
1: 44: capitalSteadyState = (aalpha*bbeta)**(1.0_dp/(1.0_dp-aalpha))
1: 45: outputSteadyState = capitalSteadyState**(aalpha)
1: 46: consumptionSteadyState = outputSteadyState-capitalSteadyState
1: 47: print *, 'Steady State values'
1: 48: print *, 'Output: ', outputSteadyState, 'Capital: ', capitalSteadyState, 'Consumption: ', consumptionSteadyState
-: 49: ! Grid for capital
17821: 50: do nCapital = 1, nGridCapital
17821: 51: vGridCapital(nCapital) = 0.5_dp*capitalSteadyState+0.00001_dp*(nCapital-1)
-: 52: enddo
-: 53: !----------------------------------------------------------------
-: 54: ! 3. Pre-build Output for each point in the grid
-: 55: !----------------------------------------------------------------
6: 56: do nProductivity = 1, nGridProductivity
89106: 57: do nCapital = 1, nGridCapital
89105: 58: mOutput(nCapital, nProductivity) = vProductivity(nProductivity)*(vGridCapital(nCapital)**aalpha)
-: 59: enddo
-: 60: enddo
-: 61: !----------------------------------------------------------------
-: 62: ! 4. Main Iteration
-: 63: !----------------------------------------------------------------
1: 64: maxDifference = 10.0_dp
1: 65: iteration = 0
258: 66: do while (maxDifference>tolerance)
257: 67: expectedValueFunction = matmul(mValueFunction,mTransition);
1542: 68: do nProductivity = 1,nGridProductivity
-: 69: ! We start from previous choice (monotonicity of policy function)
1285: 70: gridCapitalNextPeriod = 1
22900242: 71: do nCapital = 1,nGridCapital
22898700: 72: valueHighSoFar = -100000.0_dp
60486291: 73: do nCapitalNextPeriod = gridCapitalNextPeriod,nGridCapital
60486291: 74: consumption = mOutput(nCapital,nProductivity)-vGridCapital(nCapitalNextPeriod) !88
60486291: 75: valueProvisional = (1.0_dp-bbeta)*log(consumption)+bbeta*expectedValueFunction(nCapitalNextPeriod,nProductivity) !89
60486291*: 76: if (valueProvisional>valueHighSoFar) then ! we break when we have achieved the max !90
37587591: 77: valueHighSoFar = valueProvisional
37587591: 78: capitalChoice = vGridCapital(nCapitalNextPeriod)
37587591: 79: gridCapitalNextPeriod = nCapitalNextPeriod
-: 80: else
22898700: 81: exit
-: 82: endif
-: 83: enddo
22898700: 84: mValueFunctionNew(nCapital,nProductivity) = valueHighSoFar
22899985: 85: mPolicyFunction(nCapital,nProductivity) = capitalChoice
-: 86: enddo
-: 87: enddo
22900242*: 88: maxDifference = maxval((abs(mValueFunctionNew-mValueFunction)))
257: 89: mValueFunction = mValueFunctionNew
257: 90: iteration = iteration+1
257: 91: if (mod(iteration,10)==0 .OR. iteration==1) then
26: 92: print *, 'Iteration:', iteration, 'Sup Diff:', MaxDifference
-: 93: endif
-: 94: enddo
-: 95: !----------------------------------------------------------------
-: 96: ! 5. PRINT RESULTS
-: 97: !----------------------------------------------------------------
1: 98: print *, 'Iteration:', iteration, 'Sup Diff:', MaxDifference
1: 99: print *, ' '
1: 100: print *, 'My check:', mPolicyFunction(1000,3)
1: 101: print *, ' '
1: 102: call CPU_TIME (time_end)
1: 103: print *, 'Elapsed time is ', time_end - time_begin
1: 104:end program RBC_F90
I do find it a bit depressing that the, apparently respected, author of a textbook for no obvious reason except perhaps ignorance doesnâ€™t use Standard Fortran.
If the author had called it â€śIntroduction to Computational Fortran using a language similar to Fortran but non-standardâ€ť that might be been acceptable, but otherwise it is not. Donâ€™t textbooks go through any form of review process any more?