I am since more than 10 years developing a free software for calculating thermodynamic properties and phase diagrams for materials on github/sundmanbo/opencalphad.
Yesterday I got a message from someone who tried to use this software and had got an error message
At line 16 of file formatbug2.F90 (unit = 6, file = ‘stdout’)
Fortran runtime error: Expected REAL for item 7 in formatted transfer, got INTEGER
('set origin 0.0, 0.0 '/‘set size ‘,F8.4’, ‘,F8.4/‘set xlabel "’,a,’"’/'set ylab
This part od the code generates a gnuplot input file and I have extracted the code used into the small program below to reproduce the error
and this program does not generate any runtime error on my new Mac Pro using MacOS and neither did ir on my old DELL using Windows for the last 8 years.
The error is simply corrected by introducing a comma “,” after the first F8.4 in the format statement.
Neither program has any compilation errors. Inside my large program this piece of code has been present at least the last 8 years without any error messages.
This problem could easily be solved. It was just odd to find this kind of bug after 8 years of using the program.
But I have a more complex problem where there is probably a kind of inititation error as one of my test macro files for the software runs without problem the first time but when just reinitiating the program and running the macro a second time it fails. But if I remove the -O2 compiler option it works also the second time.
I would be grateful If anyone can provide some hints what kind of extra options which can be added to ensure that a reinitiation really removes all results from previous calculations.
You can also skip -pedantic, as explained in the docs,
Valid Fortran programs should compile properly with or without this option. However, without this option, certain GNU extensions and traditional Fortran features are supported as well. With this option, many of them are rejected.
…
This should be used in conjunction with -std=f95, -std=f2003, -std=f2008, -std=f2018 or -std=f2023.
As I wrote, OC is quite a large program written during almost 15 years since I retired.
Before that I contributed to the commercial Thermo-Calc software.
I have a rather personal relation to writing software and do not follow many guidelines or rules.
My code is on the github/sundmanbo/opencalphad
and there are several macro file in the examples/macro/ directory
It is used by some companies and students in many different countries and I am still adding new code.
I usually test a new release by running all the macro files but recently I have found that after 5 or 10 macros
there is a failure. Each macro works well but when running several macros after each other without
restarting the program some of them eventually fail.
Each macro starts with a NEW command to reinitiate the data structure.
But I guess I have added data structures that are not reinitiated correctly, or at all.
It is a tedious task to find this so I wonder if there is some way Fortran could help.
Gfortran has started diagnosing “missing commas” in formats by default lately. One can’t always detect them at compile time. So the run-time library also consistently reports them now too. Some mods for this have gone in just in the past few days - based on a PR I submitted to the gfortran folks the last time it was discussed on this forum. (Making sure you're not a bot!)
One can specify -std=legacy to avoid the missing comma checks. But as always, it is best to simply fix the non-conforming code.
Nice Fortran is still alive and kicking. I have programmed in Fortran since F77 when the character type became available and I am very happy declaring TYPE variables and using pointers and creating lists. I never understood the C and C++ way to handle pointers. They are variables in their own right, not just something associated with another variable.
I do not mind getting new errors when Fortran is improved although I think in this case the compiler should have found the missing comma.
There are two approaches to extensions in legacy codes. One is to support the extensions by default indefinitely, so that code that previously compiled without warnings or errors continues to compile without warnings or errors. The other approach is to warn for anything that is nonstandard, and require the programmer to specify compiler options in order to turn off those warnings (or to modify the code, of course). I personally prefer the latter approach. I think by default, nonstandard code should produce warnings and standard code should not, and anything other than that high standard should require a compiler option.
This sounds like you are using a local variable that has not been initialised with " variable = 0" or a required initial value. Typically these local variables are dynamically allocated on the stack and can take the value from the previous use of this memory location.
Changing the -O2 compiler option can change the stack location of these variables.
Older Fortran compilers may have made these variables “static” with a fixed memory location, although this has a different set of problems.
The main problem is that “uninitialised local variables” is a major problem with using legacy fortran code. This is a user error, as assuming uninitialised local variables are zero is a significant coding mistake.
Many old legacy FORTRAN compilers set memory to zero, which perpetrated this bad coding practice
Most modern fortran compilers have an option to identify uninitialised variables, which is easily fixed by including variable = 0.
You also mentioned “running the macro a second time it fails”, which was a typical result for old legacy FORTRAN compilers that had static allocation of memory to local variables.
The bottom line is “uninitialised local variables” is a lazy coding mistake that is waiting to explode in many legacy codes.
My excuse is that scientists acting as non-professional programmers are expected to develop a few 1000 lines of code per month to earn their degree. It is dangerous but also quite creatieve. I have seen professional programmers turning a promising software to a total disaster by enforcing strict programming practices.
The best is of course a good balance but it is not easy to achieve. What does this -finit-local-zero do?
Compilers that did auto-initialise to zero have created this lazy mess.
It is much better with most modern compilers that identify uninitialised variables as an error.
As for “enforcing strict programming practice”, perhaps these need to be reviewed first. It is also important that these approaches are enforced throughout the project and not just in updated code.
When I learnt Fortran, most of us were scientists and engineers. Those of us who persisted realised that a sensible and consistent programming practice was required. There always were some senior staff and phd’s that were recognised for writing dodgy code that had to be checked before use.
Could I also point out there is a difference between :
using the compiler option to zero all variables;
using a “data” statement; “data aa = 0” which is applied at compile time and
the executable statement “aa = 0” which is executed on each entry to the routine.
This is a surprising error, as the most basic code checking should identify the problems associated with the first 2 cases.
(The data statement use must be understood for it’s implicit save, which can be problematic in an OpenMP scope, as in Questions on variable scope in parallel computing)
When I started writing OpenCalphad (OC) around 2011 I was happy to discover the TYPE facility in F90. In the original Thermo-Calc (TC) software written with F77 we had created our own “workspace” facility using a large integer array with subroutines to allocate “records” where we could store reals and characters using other subroutines and linking the “records” using the indices in the integer array. I think this was inspired from reading Knuth and maybe learning a bit of the Simula language. I translated the “record” feature in TC using TYPE structures in F90 with real pointers and it has worked quite well.
There was a great hurry to finish the first steps reading a database with thermodynamic model parameters, calculate equilibria, finding numerical routines to invert matrices and handling external conditions to calculate phase diagram, learning how to use gnuplot for plotting, writing a command line user interface etc etc,
Not much time to think about structured programming. I invented some minimal documentation strategy and in 2020 I managed to make a rather complete documentation of the code at that time and a user guide. However, nobody reads documentation or user guides but they are available at the github site.
One great feature of the integer array storage in TC was the possibility to write the array on a file and later read this back into the program and continue calculating exactly where one had saved the integer workspace.
I have implemented a way to store the TYPE records in such an integer array in OC and it is possible save this on a file and I am thinking of using this facility to “dump” the current state when the calculation fails and then try to analyse the datastructure for variables which are obviously wrong.
I write this on my own, obviously if there is a team one has to have better documentation and in a way I am quite surprised anything works. What is the problem using -finit-local-zero?
Technology progresses by factorization: solution = (your innovation) x (someone else’s innovation). The problem here is you misinterpreted the second factor, the Fortran system, because a) you were not made aware of its limitations and assumptions by the tool and b) whatever you did in the first factor seemed to “work”. Now, you need to modify the first factor so that it can use the second factor properly and get the right answer. There are tools to help you fix the first factor but you might need to spend money (about the same as equipping a small home workshop), or a lot of time.
Not every compiler may have an option equivalent to that, and even if a compiler does have that option, I would not use it, because it masks errors. I should be able to read a code and understand what it does without considering if there are special compiler options that changes its meaning, so
real :: x
x = 0.0
! use x
is clearer than
real :: x
! use x, which is 0.0 because of a special compiler option
The Fortran Standards, pick any of them from 66 onward, have never guaranteed that variables would be initialized to zero (or .false. or all blanks) by default. Code which makes such assumptions has bugs waiting to happen.
Another problem I hit years ago with -finit-local-zero or its equivalent was that scalars were set to zero but arrays were not. I can’t now remember which compiler it was.
AIso, F8.4', ' was never valid in a format. F77 was the first standard allowing ', '. It and all later standards would require F8.4,', ' . See F77 13.2 or F2023 13.3.1 C1302.
Fortran is in an unusual position, being first a program (the first compiler), then a flavour of dozens of programs (other compilers), then a specification with many revisions (the Standard). Currently, a conforming compiler is required to document and report any behaviour it implements that is not contained in the Standard but this is “more honoured in the breach than the observance”. Compilers tend to NOT tell you in sufficient detail what exactly they have done with your source code when you go off-Standard. The problem might be tolerable when you use just one of these extensions, but a combinatorial explosion happens when you consider interactions between obscure extensions. Suppose extension A is precisely equivalent to source transformation X and extension B similarly to transformation Y. Which is applied first, X or Y? Some extensions cannot even be considered as source transformations. Even a single extension might be considered as a sequence of transformations, but in which order is this sequence implemented? The irony is that the most common extensions are the ones that have been around the longest, meaning that the Standard guardians kept them out of the specification for decades. Please assume that this is, usually, for good reasons, even when these reasons might not seem relevant to you.
To proceed, take that option off and see what breaks. Consult this list to find which compilers would give you the most help in finding and correcting non-conformances. When you have a mostly conforming program, you will not need to document quite as much because the Standard is already half your documentation, checked by experts for consistency. Good luck.
I am grateful for this discussion with this well informed community giving me inspiration to try to get rid of this problem and I have added some code to ensure that there are no uninitiated variables when reading the database.
However, this did not help so I have also tested compiling without -O2 and then there are no problems.
Thus it seems that -O2 is the source of the problem. How to debug that? Incidentally using -O1 has the same problem. I think -O2 is necessary as some users have simulations running for several days using the OC software.
I also found that I have a compiler option -fPIC and I had forgotten what that means but I managed to find it evidently means “position independent code” which seems to be related to the old SPARC hardware. I am not sure if this option is still needed? The option is accepted but not listed among the normal options.
I do not like to use interactive debuggers and I guess they are not really reliable together with -O2. I usually rely on adding output inside the sensitive subroutines but using -O2 it seems the write statements sometimes are also eliminated. So any hints on how to proceed is welcome. As I have written the whole code is probably some 100000 lines and the error occurs after having made quite a number of calculations.
Roughly, position-independent-code means that dynamic libraries can be loaded into memory at random addresses (it’s a security-related measure). There’s also -fpie for executables… but I’m not sure how relevant is that for Windows (e.g., Intel Fortran only supports those flags under Linux and Intel macOS).
As for the -O1 and -O2 issues, and assuming you’re using gfortran, see if you can compile your code with a different compiler.
(Since ifort/ifx inherited a million extensions from DEC, MS-FPS, etc., there’s a good chance you can compile your code with those.)