How to fix "Heisenbug"?

CRquantum · January 30, 2026, 10:23am

Dear all,

Inspired by @hkvzjal quote in Please, No More Loops (Than Necessary): New Patterns in Fortran 2023, and also @gronki and @jwmwalrus mentioned that printing the results to the screen may not always be the good way for debugging the code.

What @hkvzjal mentioned, looks like what people called “Heisenberg problem” in programming, or simply " heisenbug", Heisenbug - Wikipedia

I wonder, how do you guys fixing heisenbug? Or, what may cause heisenbug and how to prevent heisenbug?
Thanks!

PS.
For example, most heisenbug" I encounters are usually caused by accidently access the memory address which should not be accessed, and somehow the compiler did not give warning or error messages. Such as accessing the 10th element of an array while the array only contain 9 elements. Or it can be the when defining a function it contain 5 arguments, but when calling the function we did not supply 5 arguments. Sometimes I found using ‘optional’ argument in a subroutine or function may cause heisenbug" too.
Some of the heisenbug" can be found and fixed by enabling check routine interfaces like below

Enabling the blow checking sometimes help too. But for some heisenbug, those options are not enough.

Arjen · January 30, 2026, 10:30am

Heisenbugs, by their very nature, are difficult to find. There is no generally applicable strategy to hunt them down. The common causes you mention and possibile remedies are definitely the things to look for but it remains hard labour.

jkd2022 · January 30, 2026, 11:13am

Here’s a Heisenbug from our code:

ei0=-1.d8; ei1=1.d8
de=1.d8
ikgap(1:3)=1
do ik=1,nkpt
  ed0=-1.d8; ed1=1.d8
  do ist=1,nstsv
    e=evalsv(ist,ik)
    if (e <= efermi) then
      if (e > ed0) ed0=e
      if (e > ei0) then
! transfer is a workaround for a bug in Intel Fortran versions 17 and 18
        ikgap(1)=transfer(ik,ik)
        ei0=e
      end if
    else
      if (e < ed1) ed1=e
      if (e < ei1) then
        ikgap(2)=ik
        ei1=e
      end if
    end if
  end do
  e=ed1-ed0
  if (e < de) then
    ikgap(3)=ik
    de=e
  end if
end do

The line after the comment should read:

        ikgap(1)=ik

However this results in nonsensical output with Intel compilers version 17 and 18 with optimization -O2 and higher. The bug vanishes if you put in a print statement.

I think we found this by just commenting out various lines.

How obscure is that?!

jwmwalrus · January 30, 2026, 12:26pm

I didn’t really say that. What I said was that printing is not always a valid debugging method in Fortran —since printing is a side effect and those are not allowed in pure procedures.

I actually tend to use print*, exclusively for debugging, and write (... for proper output —unless the procedure is pure, in which case I try the debugger route.

(Even in Go, which has the superb delve, I tend to use fmt.Println for debugging and fmt.Printf for proper output)

In regards to heisenbugs, the one that puzzles me the most, is when compiling involves multiple libraries (with their own modules, etc.), and the bug is likely in the compiler… But as you try to create a MRE, the bug disappears.

davidpfister · January 30, 2026, 12:48pm

Sometimes a print fixes a bug. Here is a story by Lee McKeeman that sounds almost too familiar.

PierU · January 30, 2026, 1:01pm

My feeling from my experience is that heisenbugs (*) (I didn’t know the name, btw) are most of time compiler bugs.

(*) if it means bugs that vanish in debug mode with all checkings enabled, AND which have not the same behavior with some inserted prints (or whatever statement that is not supposed to fix anything)

RonShepard · January 30, 2026, 4:21pm

I have found a few compiler bugs like this, but in my case the vast majority of my own heisenbugs are code errors, that is programmer errors, that are not caught during compilation or during runtime. These are usually array bounds errors, but where the error is obscured somehow from the compiler (e.g. assumed size declarations, or explicit shape declarations with incorrect array bounds, or mismatched arguments with external subprograms). With f90 and later, another type of error like this is an incorrect intent(out) declaration which should be intent(inout), or a pointer assignment that points to a compiler generated temporary instead of the expected target. These are programmer errors, not compiler errors, but they can be difficult to locate because changing compiler options can make the symptoms vanish while leaving the error still in the code.

On the other hand, I still have relatively simple looking code that uses parameterized data types that does not compile correctly on popular compilers. I also have similar issues with some object-oriented code. These are in fact compiler errors, not programmer errors, so these certainly do exist, even after a couple of decades since they were identified.

urbanjost · January 30, 2026, 8:40pm

Pretty much the same experience. If it goes away with a print statement it is usually because of a shift in memory and it is likely array bound or type mismatch. Modules help to greatly reduce the mismatch issues. Sometimes it goes away because the print statement causes certain optimizations to go off but if the problem lingers and you do not find the error in the code the best tool in many circumstances is multiple compilers. If you have access to three or four compiles and the error only occurs on one of them it is a good time to access the bug reports if possible (I really like gfortran using bugzilla for that reason. It is easier to search with and accessible than most compiler bug trackers). So accessible bug trackers and multiple compilers help a lot. I used to teach a in-house Totalview class and vendor tools and external tools like valgrind can be great but I almost always start with print statements and deleting blocks of code. In my opinion debuggers are best used when looking for logic problems. As soon as memory is being clobbered or working with a compiler bug I have found myself debugging the debugger more than the code I am working on if I jump to a debugger first.

As somewhat implied, reducing the complexity is almost always a good direction to go after first trying some of the low-hanging fruit like turning on array bound checks and using other compiler switches that help debugging and the aforementioned print statements. Because of the I/O restrictions on newer features like PURE and SIMPLE procedures a debugger does get more attractive as it generally lets you step through and inspect those kinds of routines; but that is a relatively recent development.

But reducing the code complexity as much as possible not only is a good approach to dealing with practically any bugs but in my experience is particularly fruitful with real compiler bugs. If it is a compiler bug providing a 20-line reproducer is far more likely to get it worked on that providing a million-line code and saying “this does not work. Can you fix it?”. One vendor where I thought the guys probably hated me because I had 42 tickets open instead used to call and talk to me directly about the bugs when they were not supposed to and they said it was because a always verified the bug and made a small reproducer for (almost) all cases; and a few years later helped me get a job at their company, so another tip is be nice to your compiler developers!

rwmsu · January 30, 2026, 11:40pm

Exactly. Optimization issues can also be tricky. I had a problem in a code a few years back with one version of ifort where I got the wrong answers with optimization turned off but got my expected answers with -O2. Never figured out was was going on so I just rewrote that section of the code (I was using some OO that I replaced with standard procedural code) and the problem went away. I’ve also had problems with debuggers not giving you the correct point in the code where something is going wrong. In my case, the two classic flang compilers (Nvidia and AMD) refused to compile a bit of code without an internal compiler error. I couldn’t see any possible error in the section of the code the debugger pointed to. Turned out the error was in a module that compiled correctly but triggered an error when the module was USED in the routine that I thought was triggering the ICE. Again, the traceback and debugger led me to believe the error was elsewhere.

CRquantum · February 6, 2026, 10:15am

I found that in windows, Microsoft’s WinDbg.exe included in windows SDK is actually good (although its UI reminds me about windows xp or vista).

I have a code with heisenbug, I generated the debug info when building the exe file. Intel oneAPI gdb and visual studio 2022’s debugger cannot locate at the source code where the memory corruption occurs, they just show some ??? symbols.
However, WinDbg clearly traced back to the exact subroutine where memory corruption occurs, thank god it helped me fixed the problem, otherwise I cannot sleep well
I guess the reason WinDbg works so good is because the code is built with Windows SDK

Topic		Replies	Views
Idea Help: Simple code that demonstrates non-trivial, realistic bug Help	9	884	June 17, 2023
Does anyone use /fsanitize=address for C++/Fortran code	1	189	February 9, 2026
Debugging and PURE procedure cascade Help	23	967	December 13, 2023
Please, No More Loops (Than Necessary): New Patterns in Fortran 2023	94	2666	February 11, 2026
Compiler error messages: show just the first, or all of them? Poll	24	2351	March 11, 2024

How to fix "Heisenbug"?

Related topics