I have a code, -O0, -O2, -O3, -Ofast works fine on Ubuntu and windows.
However, the same code, on M1 chip using gfortran 11.2.0, -O0 works, however with -O2, -O3, -Ofast flags it can compile and build fine, but when executing, it will soon have a segment 11 error.
Since I do not have M1 chip to test myself. I just wanted to see if anyone have similar issue?
If so, what kind of issues in the code may cause -O2, -O3, -Ofast lead to segment 11 error for M1 chip?
By the way, will the release of GCC 12 fully support M1 chip?
The Intel OneAPI seems can install and run on M1 chip. However the optimization flags do not seem to really have much effect, the speed is about the same as (slightly faster than) gfortranās -O0.
There is a good chance that there is an error in your program even when it āworks fineā. Use valgrind and a debugger, turn all available error checking on in your compiler or invest in a commercial compiler (which will let you send such problems to their support people who get paid to fix them).
It is just a little strange that using the same flag and the same 11.2.0 gfortran, the code runs fine on ubuntu and windows. But I think what you said is highly likely. I will check my code.
I donāt have gfortran recommendations, but other people have suggested some on this forum. If you search for some of the options you mentioned, you should find them.
And no, I donāt think it is strange at all. Fortran compilers have traditionally been built for runtime performance, which means no runtime checks. An error that happens in the first million instructions generated might not matter during the run of the program, or it might matter only when optimised code is run, or, if you are lucky, be caught by the operating system, or, if you are unlucky, screw up your results. One time out of fifty, it is a compiler bug. Remember, the compiler writer has (through the compiler) seen more code than any programmer has ever written (at least for commonly used features).
Another question, one code, using intel fortran vs use gfortran, the computation result be the same right?
I mean assuming the code have no ārandom partā (eg, even if contain random number, the random numbers are all generated from repeatable generator, so the code should have āfixedā result).
I found my intel fortran result is a little different from gfortranās. The difference is big enough and that perhaps means there may be some issues in my code. I know in some places I use the intrinsic function
tiny(1.0)
to represent things slightly bigger than zero but not exactly zero. Such as the normal distributionās pdf, which in principle should be bigger than zero. But I should probably just set it zero once it below some threshold, instead of using tiny().
Thank you very much @wclodius for pointing out that.
How did you increase the stack size?
I probably will add a flag when comping. When I turned on many debug flags for my code, I found -frecursive flag seems good for that purpose.
Consider increasing the ā-fmax-stack-var-size=ā limit (or use ā-frecursiveā, which implies unlimited ā-fmax-stack-var-sizeā) - or change the code to use an ALLOCATABLE array. If the variable is never accessed concurrently, this warning can be ignored, and the variable could also be declared with the SAVE attribute. [-Wsurprising]
I guess the issue may coming from a modern Fortran ode solver I am using, perhaps it uses many fancy modern features which the current gfortran may not optimize well with M1 chip. I will turn off this modern ode solver, and use some old F77 solver and have a try.
No, you cannot expect results to be the same. Even the SUM intrinsic, used to sum millions of numbers stored on a file, will return different answer. Scientists using computer floating point numbers should take a course of numerical analysis. People who wonāt or canāt will be forced to look for ābit reproducibleā software and miss out on performance (and insight/understanding).