Only my Windows compilation fails?

The model I have works fine under Linux and MacOS. Everything runs fine and the results are as expected.

However, when I compile the same code under Windows it crashes; almost immediately.

I have set Windows to provide me with as much feedback as possible:

gfortran -O -fcheck=all -fbacktrace -ffpe-trap=invalid,zero,overflow,underflow modules.f90 stelcor.f90 dummymain.f90 -o dummymain

Yet the output is not terribly useful…

At line 4187 of file stelcor.f90
Fortran runtime error: Index '-2147483647' of dimension 2 of array 'opactb' below lower bound of 1

Error termination. Backtrace:

Could not print backtrace: libbacktrace could not find executable to open
#0  0xca3d8a7a
#1  0xca35f0c1
#2  0xca31ed3a
#3  0x48eee01c
#4  0x48ef0965
#5  0x48ef5eef
#6  0x48efb0b5
#7  0x48f00e3a
#8  0x48f0bcfa
#9  0x48f0d13d
#10  0x48f0fc6f
#11  0x48ee1394
#12  0x48ee14e5
#13  0xf83b26ac
#14  0xf96caa67
#15  0xffffffff

Obviously the fault is appearing at line 4187 in stelcor.f90, and, obviously, the index value for the array opactb is negative (which is very wrong) - it has three dimensions (opactb(14, 19, 70)).

But this is a small module in a code of 5000+ lines, with lots of function calls throughout.
And I am curious why the same code, with the same compiler arguments, works under the other two OSes (though I am aware that Mac is based on Unix as well, so I would expect a little more conformity between these two).

  1. How do I get the system to print the backtrace under Windows (this works fine on my Mac and Linux)?
  2. Any ideas why the system fails under Windows and not the other two (general idea; clearly you can’t see the code itself)?
  3. Should I be bothering with Windows anyway? (basically, is this likely to be a fault that is being masked under the other two environments or are people used to Windows acting strangely when it comes to compiled Fortran code?)
1 Like

Hi Gary,
I have come across this situation (code works on one OS but not another) several times with the various C/Fortran codes I come across at work. In every case I have come across so far, it is because there was a mistake in the code that went unnoticed on the ‘working’ OS. I frequently try to remind researchers that just because code compiles and runs successfully in one environment, that does not mean that it is correct or without error. It is also for this reason that I encourage researchers to frequently test their code across multiple platforms and machines.

So to answer your questions more directly:

  1. I’m not sure how to get backtraces this on Windows
  2. My experience says there is something wrong with the code that has gone unnoticed
  3. I don’t generally have issues running compiled Fortran code on Windows

If you have access to Intel Fortran, try compiling with /Z7 /check:all /gen-interfaces to also check for possible use-before-define errors (which I don’t think gfortran checks).

Hope that helps :slight_smile:

2 Likes

If you have access to Intel Fortran compilers, try, complementing @lkedward 's answer, with /Od /Z7 /debug:all /Qtrapuv /RTCu /check:all /warn:all /WB /traceback. (/gen-interafces is included in /warn:all).
Looks like an uninitialised variable. Indeed is just a guess. Also, on Windows consider using the Visual Studio integrated debugger tool, which is very powerful.

The title of the thread, “…Windows compilation fails”, is incorrect and misleading. The compilation did not fail, since Gfortran compiled your code and produced an EXE file. It is the compiled program that failed to run as expected.

There are many possible errors in a Fortran program that the Fortran standard puts in the categories of errors that cause the run-time behavior of the program to be “undefined”; what many newcomers to Fortran may not expect is that “undefined” includes running and producing correct output.

Not much can be done without seeing what your 4000+ lines of code contain, what calculation it is supposed to perform, what data is input, etc.

Your error message is pretty clear:

At line 4187 of file stelcor.f90
Fortran runtime error: Index '-2147483647' of dimension 2 of array 'opactb' below lower bound of 1

you are accessing opactb(i,j) with j==-huge(0).

To debug your case further on Windows, you may consider running your code the gdb debugger which is shipped with several distributions (MSYS2, equation.com’s installer, …)

  1. build with debugging flags: gfortran [...] -Og -g -pg [...]
  2. run:
> gdb my_code.exe
> run

this will provide you with the exact line of code and its backtrace of where your out-of-bounds is taking place. But you will need to remove the fcheck=all flag, because that would stop the code before the error takes place.

Oh, I never had doubt that there was an error in my coding!
Only that the compilers were treating the code differently and wondered if others might have some insight.
Equally, and more importantly, I wanted to see if anyone could help identify the backtracing that Windows will not report on.

I think your interpretation of my title is incorrect and misleading.

I have three compilations of my code. Exactly the same code in all three instances but I use 3 different compilers; a Linux compiler, a MacOS compiler and a Windows compiler.

The executable fails.

Now, the only reason why the code fails is because there is an error in my code; that is a given.
However, I am curious. Since the same code does not fail under MacOS nor does it fail under Linux, does anyone have any general understanding of why this might be? Anything that might help me to identify the issue.

Equally, I am unable, under the Windows system, to be able to build a backtrace from the error; maybe someone might be able to direct me in how to do that?

So, how do I refer to the problem I face?

I have a failure (my code) and the only one that fails was compiled, specifically, by Windows. So, my title is both correct and clear. ONLY my WINDOWS compilation fails!

Now, if I had said “Only Windows Compiler fails”, I think you’d have a point.

I fear it was your misinterpretation of my title that caused you issues.

And…you are incorrect i your final statement. A couple of replies in here have already helped me begin to analyse the 5000+ lines a little bit more to find MY coding error. :slight_smile:

Thank you.

I knew what the fault was. As you state, it is bleedin’ obvious in the error message. Over the course of the last 2 years of working with Fortran, I have gotten used to seeing similar errors.
Hell, after over 40 years of coding in a variety of languages, I have seen very similar errors before.

My point was that I was not getting the trace information from the Windows compilation that I have been getting elsewhere; I wondered if anyone could direct me towards better flags; which you have done (thank you).

Equally, someone might have come across something similar and been able to direct me to other flags that I could use to ensure that MacOS and/or Linux would report the same error - but with the detailed information I have been used to getting.

I will explore if I have gdb. :slight_smile:

Thank you; I will try these points.

I think this might even be an incorrectly used loop. This isn’t a random thought, I came across one a while back that defined the loop using j, but a single line of code used i instead. That was a while back but I had to take a leap backwards recently due to another issue, and there is a chance this coding flaw snuck back in. :slight_smile:

Gary,
Have you tried valgrind - in particular memcheck - on linux. It often detects “bad things” such as uninitialized memory when I use it on my code.

Two questions :frowning:

  1. how much memory is available on Windows.
  2. why is index = - huge (index)

You could run with valgrind on Linux and very likely find the error that was causing problems in Windows. Whatever it is, it is likely to cause problems in Linux eventually. It is just good luck that it works now. Valgrind is very useful like that. gfortran has some debugging aids. Yo u appear to be using -fbounds-check already. Have you added -fbacktrace? Also gdb has a “watch” command that will stop at every instance of an assignment to a variable or a condition being satisfied and let you see the statement being executed. You could look for a change in j. I haven’t been able to master gdb (there isn’t much available to explain how to use it with fortran) but it can be helpful. If someone would write a bit more than is available for fortran, that would be a great help. Even without much understanding of gdb, it would give you a backtrace and you could examine other values.

That said, I usually end up with a bunch of print statements when debugging.

  1. I have 128GB of RAM on this computer.
  2. That is what I have been trying to identify

Thank you.
I have been trying to use GDB but with little success. I will try this now.

Oh, I have used LOTS of print statements throughout.

I tried GDB but seemed to encounter more problems trying to get it working than I got simply by testing code with the print statements.

I am about to put up a curious one prompted by these very print statements.

You are giving some indications that the problem is hardware/PC dependent.
When I first tried 64 GBytes of overclocked memory on a new Ryzen processor, I was getting program crashes, so I reduced the memory overclock rate and the problem went away.
Do bad/unreliable memory pages get maped out ?

No, I am not of the opinion it is hardware related. I believe it might be due to differences within the compilers themselves. I think that the Windows compiler creates a code that highlights an issue that the other versions (Linux/MacOS) ‘mask’.
That is NOT a complaint at any of the compilers; they are impressive. It is more that my code definitely has an issue, but it is the way the compiler is interpreting the code that makes the error masked or visible. Currently, I am looking at a line of code that works effectively in most cases, but seems to fail under the Windows system certainly far earlier than it does under the other two (not sure if the later crashes are for the same issue or not, at present). This could simply be the way the data is being stored; that is something I am looking into.
How do I know that this is not a hardware issue? Because my Linux and Windows systems are running on the same hardware.

It is interesting that Index '-2147483647' is not a random value, but probably not the issue.
Is the error consistently reproduceable ?
I also get “Could not print backtrace”, which is annoying.

It was but now I am at another problem! hahahaha!

Seems there are a few logical errors in the code.