I have a code, -O0, -O2, -O3, -Ofast works fine on Ubuntu and windows.
However, the same code, on M1 chip using gfortran 11.2.0, -O0 works, however with -O2, -O3, -Ofast flags it can compile and build fine, but when executing, it will soon have a segment 11 error.
Since I do not have M1 chip to test myself. I just wanted to see if anyone have similar issue?
If so, what kind of issues in the code may cause -O2, -O3, -Ofast lead to segment 11 error for M1 chip?
By the way, will the release of GCC 12 fully support M1 chip?
The Intel OneAPI seems can install and run on M1 chip. However the optimization flags do not seem to really have much effect, the speed is about the same as (slightly faster than) gfortran’s -O0.
Thank you very much in advance.
There is a good chance that there is an error in your program even when it “works fine”. Use valgrind and a debugger, turn all available error checking on in your compiler or invest in a commercial compiler (which will let you send such problems to their support people who get paid to fix them).
Thank you very much @themos !
For gfortran, what debug flag do you recommend to use?
-O0 -fbacktrace -fcheck-all
or like, as mentioned in
-g -Wall -Wextra -Warray-temporaries -Wconversion -fimplicit-none -fbacktrace -ffree-line-length-0 -fcheck=all -ffpe-trap=invalid,zero,overflow,underflow -finit-real=nan
It is just a little strange that using the same flag and the same 11.2.0 gfortran, the code runs fine on ubuntu and windows. But I think what you said is highly likely. I will check my code.
I don’t have gfortran recommendations, but other people have suggested some on this forum. If you search for some of the options you mentioned, you should find them.
And no, I don”t think it is strange at all. Fortran compilers have traditionally been built for runtime performance, which means no runtime checks. An error that happens in the first million instructions generated might not matter during the run of the program, or it might matter only when optimised code is run, or, if you are lucky, be caught by the operating system, or, if you are unlucky, screw up your results. One time out of fifty, it is a compiler bug. Remember, the compiler writer has (through the compiler) seen more code than any programmer has ever written (at least for commonly used features).
Thank you @themos .
Another question, one code, using intel fortran vs use gfortran, the computation result be the same right?
I mean assuming the code have no ‘random part’ (eg, even if contain random number, the random numbers are all generated from repeatable generator, so the code should have ‘fixed’ result).
I found my intel fortran result is a little different from gfortran’s. The difference is big enough and that perhaps means there may be some issues in my code. I know in some places I use the intrinsic function
to represent things slightly bigger than zero but not exactly zero. Such as the normal distribution’s pdf, which in principle should be bigger than zero. But I should probably just set it zero once it below some threshold, instead of using tiny().
I have had problems with the Mac OS X stack limit. By default it is small even unlimited is not unlimited.
Thank you very much @wclodius for pointing out that.
How did you increase the stack size?
I probably will add a flag when comping. When I turned on many debug flags for my code, I found
-frecursive flag seems good for that purpose.
Consider increasing the ‘-fmax-stack-var-size=’ limit (or use ‘-frecursive’, which implies unlimited ‘-fmax-stack-var-size’) - or change the code to use an ALLOCATABLE array. If the variable is never accessed concurrently, this warning can be ignored, and the variable could also be declared with the SAVE attribute. [-Wsurprising]
I guess the issue may coming from a modern Fortran ode solver I am using, perhaps it uses many fancy modern features which the current gfortran may not optimize well with M1 chip. I will turn off this modern ode solver, and use some old F77 solver and have a try.
No, you cannot expect results to be the same. Even the SUM intrinsic, used to sum millions of numbers stored on a file, will return different answer. Scientists using computer floating point numbers should take a course of numerical analysis. People who won’t or can’t will be forced to look for “bit reproducible” software and miss out on performance (and insight/understanding).
You can find out your current stack size in kilobytes with
I believe the default should be 8192. You can increase the stack size with
ulimit -s new_size_in_kb
The maximum stack size can be set using
ulimit -s unlimited
In current versions of Mac OS X the maximum stack size is 65532 Kb, i.e., eight times the default.
The stack size can be a problem if you allocate arrays on the stack and not on the heap, or the data requires large recursive depth for processing.