Difference between -O0 and -O1

In my thermodynamic code I have for more than 10 years used a reliable LAPACK routine to solve my linear set of equations. Recently while implementing a new model in the software I was getting spurious convergence problems which I believed were caused by errors in the code for the model. But I could not find that and (as I am working all alone) I finally asked the AI Claude fore help. Claude analyzed my code and found no errors but he recommended that I compile my debug version of the code (which I use for developments) with the compiler option -O1 because that would mean less shuffling of data between memory and registers. Normally I have no -Ox option for debug.

I have tested this and it seems correct but this seems illogical to me. Claude wrote some explanation about -O0 is for safe logics and -O1 for safe arithmetic which I did not understand.

I use double precision everywhere and I expect to have at least 10 digits correct and my convergence criteria is 6 digits correct. I am surprised that -O0 does not give that. But at least I am back to the keyboard doing some progress rather than fighting convergence daemons and I wanted to share this experience for other less knowledgeable of the compiler/computer internals.

What compiler and version. I had a problem with a version of ifort about 5 years ago where my code gave wrong answers with -O0 but correct ones with -O1. Until then I always assume that -O0 would always be “correct” in the sense that there was no optimization being done that could change numbers drastically.

I have an Apple M4 Pro and use gfortran, gcc version 15.2.0

I am really surprised double precision reals does not have even 6 digits precision and as you, I believed -O0 would avoid rounding off errors due to optimization.

There are many possible reasons for this kind of behavior. The final accuracy of a calculation depends on the floating point precision and also on how the various errors within a sequence of calculations grow or shrink. If your problem is ill conditioned, meaning that those errors grow, then you may end up with no correct significant digits, even with double or extended precision floating point. That might be one reason why your results change with optimization level. There are many examples of how a change in an algorithm can affect the accuracy of a computed result: you can use pivoting while solving a linear equation, or you can use an implicit solution to a differential equation rather than an explicit solution, or you can use reverse recursion rather than forward recursion in a series solution, or you can change the number of terms in a Taylor expansion. Sometimes there are several “good” ways to solve a problem and only a few “bad” ways, sometimes it is the other way around, sometimes the problem (or probems) are easily isolated and solved, sometimes not. Maybe your problem is just a coding error (say it references an undefined variable), rather than an algorithm or conditioning problem. Although usually unlikely, it could be a compiler error and no fault of your own. There are many possibilities.

The problem occurred solving a 3x3 matrix using a well established LAPACK subroutine which has been working well for more than 10 years inside my software. But my question was the surprising the advice to compile with -O1 rather than -O0 to obtain better numerical accuracy. My basic feeling has always been that any attempt to optimize the code reduces the numerical accuracy. Maybe the Apple M4 chip has a problem, I have used the Mac less than a year.

Just for fun try replacing your LAPACK call with the attached subroutine that solves the matrix equation using Cramer’s rule. For a 3x3, the time required should be trivial.

cramer.f90 (1.7 KB)

I “borrowed” the original Fortran IV version of this from an old textbook and converted it to Modern Fortran. Also, did you compile the version of LAPACK you are using with the same compiler that gives you an error or are you just linking in a library you downloaded via Homebrew etc.

Don’t shackle yourself to a single compiler if you can avoid it. My reasons: 1. One may have better error messages than another or detect bugs that another doesn’t. 2. If a given program runs with two compilers but gets different results there is always a reason. Usually my fault, occasionally a bug in a compiler or other software, always worth investigating. 3. One compiler may run your program faster than another, and which compiler is faster in your machine may be different with different programs.

2 Likes

It looks like you have it narrowed down pretty well already. So at this point, you presumbly know that the output from that solution is different with -O0 and -O1. Here are some of the things to look for at this point in the process. Are the inputs the same in both cases? The inputs would be the 3x3 matrix and the rhs vector. If any of those array values are different, then you haven’t really located the source of the problem but rather just one of its eventual results. Something else to look for is the condition number of that 3x3 matrix. The condition number is the ratio of the largest and smallest singular values of the matrix. There are several ways to compute that value, and also several ways to cheaply approximate that value, but for a 3x3 matrix, I would suggest just computing the SVD and looking at the singular values. There are LAPACK routines to do that computation. A large condition number means that the roundoff errors within the computation can accumulate and be greatly amplified within the solution vector.

Yes, in addition to programmer errors and compiler errors, there can also be hardware errors. A famous one was the FDIV error in Pentium CPUs in the 1990s. When moving from the previous 486DX architecture to the Pentium architecture, a decision was made to change at the hardware level from a shift-and-subtract division algorithm to the SRT algorithm, the latter requiring about half the clock cycles to compute the result. But errors in the lithography equipment resulted in an incorrect lookup table which resulted in incorrect division results. So hardware problems can and do exist, although more rare even than compiler bugs, which in turn are more rare than programmer errors. Pentium FDIV bug - Wikipedia

I forgot to mention the fourth reason for different results from different compilers: 4. Fortran has many processor-dependent features.

A trivial counterexample is given here: Result for x*1/sqrt(x) . Eliminating the division is both faster and more accurate.

I also think this is useful to check if the matrix is built using some thermodynamic quantities very different in size…

I also wonder if some check options might be able to detect possible “hidden” issues? In the case of gfortran, options like

-fcheck=all -ffpe-trap=invalid,zero,overflow,
-finit-real=snan -finit-integer=-999 -finit-derived
-fbacktrace -g

may be useful for checking purpose. RE the difference between -O0 and -O1, I guess floating-point comparison like if (a == b) ... (where a and b are real) might cause some problem (if they are used in some library codes etc).

RE different compilers on Mac, we can use flang via Homebrew like

$ brew info flang
$ brew install flang

which might be useful for comparison.

… but this seems illogical to me. Claude wrote some explanation about -O0 is for safe logics and -O1 for safe arithmetic, which I did not understand.

I think it is Claude that doesn’t understand (stochastic parrot). However, as others noted, if the numerics are sketchy (poor condition number, large singular values, etc.), just perturbing the operations (order, mathematically equivalent, but not machine equivalent calculations) can “improve” things (given a contemporary Apple Silicon system, I doubt register vs. memory is the issue … with the 80-bit Intel and Motorola fp units of yesteryear, that was a common problem).

If the computation is ultimately solving a system of equations, you might look at alternative formulations (e.g. not inverting a matrix ;>).

Certainly there are many situations like this where speed is traded for accuracy, but there are also many other types of optimizations where this is not the case. For example, changing algorithms allows for the possibility of improving both speed and accuracy. However, changing -O0 to -O1 typically does not change algorithms, but rather uses memory bandwidth more efficiently or uses different hardware (vector instructions instead of scalar, or gpu instead of vector) to implement the same algorithm. These changes sometimes affect accuracy, maybe different rounding modes come into play, but sometimes they occur with no change in accuracy.

You could check the determinant of the 3x3 matrix, to check for singularity.
It could just be a badly conditioned edge case problem.