I have been tracking through a curious error that only occurs under certain conditions. The problem is that the code has to run for some time before those issues manifest.
Anyway, I have been hunting down issues as they arise, through the use of the debugging tools within GFortran and using a series of, basically, print statements.
I have been using the following compiler instructions:
gfortran -O -g -fbacktrace -ffpe-trap=invalid,zero,overflow modules.f90 stelcor.f90 dummymain.f90 -o dummymain
The -g
is there so that I can use GDB, if I can get my head around it.
Anyway, I have been placing write
statements around my code and have identified that the code breaks in a particular subroutine: In fact, I have been whittling it down to where in that subroutine it appears to break; yet the result makes no sense to me.
I am probably missing something obvious but here’s where the issue appears to be arising…
if (j > 800) write (lunit, *) "addsub 11", j, n
jcomp = min(j, n / 10)
if (j > 800) write (lunit, *) "addsub 12"
This is within a loop, where j
is the stepper variable. I have been able to identify that the code breaks well in to when j > 850
. So, I set my output to generate as much successful data as it can, without making the execution time insanely long.
The output file gives me…
...
addsub 11 815 816
addsub 12
addsub 13
add 1
add 2
addsub 11 816 817
addsub 12
addsub 14
addsub 11 801 817
addsub 12
addsub 11 802 817
addsub 12
addsub 11 803 817
addsub 12
addsub 11 804 817
addsub 12
addsub 11 805 817
addsub 12
addsub 11 806 817
addsub 12
addsub 11 807 817
addsub 12
addsub 11 808 817
addsub 12
addsub 11 809 817
addsub 12
addsub 11 810 817
addsub 12
addsub 11 811 817
Those are the last 28 lines of output.
As you can see, at the end of a previous run, the code (j = 815
and n = 816
) goes through my two outputs, continues to a later stage of the subroutine (addsub13
) and enters another subroutine; returning to this subroutine afterwards; where j = 816
and n = 817
.
The curious thing is that, as you can see, the file ends on the addsub 11
; where there is only 1 line of code being executed - a line that has run successfully multiple times, and where the values are, seemedly, perfectly fine.
The line number of this final line is 119776, whilst the size of the file is 15MB; so, this is not blowing out due to file size issues.
The error being reported is…
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
And the trace states…
Backtrace for this error:
#0 0x7fb22be66960 in ???
#1 0x7fb22be65ac5 in ???
#2 0x7fb22bb5651f in ???
#3 0x55d4300ed3dc in __stelcor_module_MOD_atmos
at /mnt/c/Users/garyn/OneDrive/Stored/Programming/NEW STELCOR/STELCOR - 13.08.2023/stelcor.f90:661
#4 0x55d4300f203b in __stelcor_module_MOD_gi
at /mnt/c/Users/garyn/OneDrive/Stored/Programming/NEW STELCOR/STELCOR - 13.08.2023/stelcor.f90:1951
#5 0x55d4300f510e in __stelcor_module_MOD_henyey
at /mnt/c/Users/garyn/OneDrive/Stored/Programming/NEW STELCOR/STELCOR - 13.08.2023/stelcor.f90:2632
#6 0x55d4300fcc4e in __stelcor_module_MOD_stelcor
at /mnt/c/Users/garyn/OneDrive/Stored/Programming/NEW STELCOR/STELCOR - 13.08.2023/stelcor.f90:166
#7 0x55d4300fe028 in MAIN__
at /mnt/c/Users/garyn/OneDrive/Stored/Programming/NEW STELCOR/STELCOR - 13.08.2023/dummymain.f90:249
#8 0x55d4300fe588 in main
at /mnt/c/Users/garyn/OneDrive/Stored/Programming/NEW STELCOR/STELCOR - 13.08.2023/dummymain.f90:19
I have advice from another question I posted, which I will be trying out later, in terms of a debugging tool I might be able to utilise. However, right now I am puzzled and wondering what I am missing.