Compilation flags advice for production and distribution

The choice of compiler flags depends a lot on the goal. Is it a shareable library? Is it a final main program on a supercomputer?
These are the flags I use to generate libraries,

  • Intel ifort debug on UNIX:
    -debug full           # generate full debug information
    -g3                   # generate full debug information
    -O0                   # disable optimizations
    -CB                   # Perform run-time bound-checks on array subscript and substring references (same as the -check bounds option)
    -init:snan,arrays     # initialize arrays and scalars to NaN
    -warn all             # enable all warning
    -gen-interfaces       # generate interface block for each routine in the source files
    -traceback            # trace back for debugging
    -check all            # check all
    -check bounds         # check array bounds
    #-fpe-all=0            # Floating-point invalid, divide-by-zero, and overflow exceptions are enabled
    -fpe0                 # Ignore underflow (yield 0.0); Abort on other IEEE exceptions.
    -diag-error-limit=10  # max diagnostic error count
    -diag-disable=5268    # Extension to standard: The text exceeds right hand column allowed on the line.
    -diag-disable=7025    # This directive is not standard Fxx.
    -diag-disable=10346   # optimization reporting will be enabled at link time when performing interprocedural optimizations.
    -ftrapuv              # Initializes stack local variables to an unusual value to aid error detection.
    
  • Intel ifort debug on Windows
     /debug:full
     /stand:f18      # issue compile-time messages for nonstandard language elements.
     /Zi
     /CB
     /Od
     /Qinit:snan,arrays
     /warn:all
     /gen-interfaces
     /traceback
     /check:all
     /check:bounds
     #/fpe-all:0
     /fpe:0
     /Qdiag-error-limit:10
     /Qdiag-disable:5268
     /Qdiag-disable:7025
     /Qtrapuv
    
  • Intel ifort release on Windows:
     /O3                             # Enable O3 optimization.
     /Qvec                           # enable vectorization.
     /Qunroll                        # [:n] set the maximum number of times to unroll loops (no number n means automatic).
     /Qunroll-aggressive             # use more aggressive unrolling for certain loops.
     /Qinline-forceinline            # Instructs the compiler to force inlining of functions suggested for inlining whenever the compiler is capable doing so.
     #/Qguide-vec:4                   # enable guidance for auto-vectorization, causing the compiler to generate messages suggesting ways to improve optimization (default=4, highest).
     #/Qparallel                      # generate multithreaded code for loops that can be safely executed in parallel.
     #/Qipo-c:                        # Tells the compiler to optimize across multiple files and generate a single object file ipo_out.obj without linking
                                     # info at: https://software.intel.com/en-us/Fortran-compiler-developer-guide-and-reference-ipo-c-qipo-c
     /Qftz
     /Qipo               # enable interprocedural optimization between files.
     /Qip                # determines whether additional interprocedural optimizations for single-file compilation are enabled.
    
  • Intel ifort release on UNIX:
     -stand f18                      # issue compile-time messages for nonstandard language elements.
     -O3                             # set the optimizations level
     -unroll                         # [=n] set the maximum number of times to unroll loops (no number n means automatic).
     -unroll-aggressive              # use more aggressive unrolling for certain loops.
     -diag-disable=10346             # optimization reporting will be enabled at link time when performing interprocedural optimizations.
     -diag-disable=10397             # optimization reporting will be enabled at link time when performing interprocedural optimizations.
     #-guide-vec=4                    # enable guidance for auto-vectorization, causing the compiler to generate messages suggesting ways to improve optimization (default=4, highest).
     #-parallel                       # generate multithreaded code for loops that can be safely executed in parallel. This option requires MKL libraries.
     #-qopt-subscript-in-range        # assumes there are no "large" integers being used or being computed inside loops. A "large" integer is typically > 2^31.
     -ftz                            # Flushes subnormal results to zero.
     -inline-forceinline # Instructs the compiler to force inlining of functions suggested for inlining whenever the compiler is capable doing so.
     -finline-functions  # enables function inlining for single file compilation.
     -ipo                # enable interprocedural optimization between files.
     -ip                 # determines whether additional interprocedural optimizations for single-file compilation are enabled.
    
  • gfortran debug
      -g3                                 # generate full debug information
      -O0                                 # disable optimizations
     #-fsanitize=undefined                # enable UndefinedBehaviorSanitizer for undefined behavior detection.
     #-fsanitize=address                  # enable AddressSanitizer, for memory error detection, like out-of-bounds and use-after-free bugs.
     #-fsanitize=leak                     # enable LeakSanitizer for memory leak detection.
      -fcheck=all                         # enable the generation of run-time checks
      -ffpe-trap=invalid,zero,overflow    # ,underflow : Floating-point invalid, divide-by-zero, and overflow exceptions are enabled
      -ffpe-summary=all                   # Specify a list of floating-point exceptions, whose flag status is printed to ERROR_UNIT when invoking STOP and ERROR STOP.
                                          # Can be either 'none', 'all' or a comma-separated list of the following exceptions:
                                          # 'invalid', 'zero', 'overflow', 'underflow', 'inexact' and 'denormal'
      -finit-integer=-2147483647          # initilize all integers to negative infinity
      -finit-real=snan                    # initialize REAL and COMPLEX variables with a signaling NaN
      -fbacktrace                         # trace back for debugging
     #-pedantic                           # issue warnings for uses of extensions to the Fortran standard. Gfortran10 with MPICH 3.2 in debug mode crashes with this flag at mpi_bcast. Excluded until MPICH upgraded.
      -fmax-errors=10                     # max diagnostic error count
      -Wno-maybe-uninitialized            # avoid warning of no array pre-allocation.
      -Wall                               # enable all warnings:
                                          # -Waliasing, -Wampersand, -Wconversion, -Wsurprising, -Wc-binding-type, -Wintrinsics-std, -Wtabs, -Wintrinsic-shadow,
                                          # -Wline-truncation, -Wtarget-lifetime, -Winteger-division, -Wreal-q-constant, -Wunused, -Wundefined-do-loop
                                          # gfortran10 crashes and cannot compile MPI ParaMonte with mpich in debug mode. Therefore -wall is disabled for now, until MPICH upgrades interface.
     #-Wconversion-extra                  # Warn about implicit conversions between different types and kinds. This option does not imply -Wconversion.
     #-Werror=conversion                  # Turn all implicit conversions into an error. This is important to avoid inadvertent implicit change of precision in generic procedures of various kinds, due to the use of `RK` to represent different kinds.
     #-Werror=conversion-extra            # Turn all implicit conversions into an error. This is too aggressive and as such currently deactivated. For example, it yields an error on the multiplication of integer with real.
      -fno-unsafe-math-optimizations
      -fsignaling-nans
      -frounding-math
      -Wno-surprising                     # -Wsurpring yields many false positives like "Array x at (1) is larger than limit set by '-fmax-stack-var-size='".
    
  • gfortran release on MacOS
     -fauto-inc-dec
     -fbranch-count-reg
     -fcombine-stack-adjustments
     -fcompare-elim
     -fcprop-registers
     -fdce
     -fdefer-pop
     #-fdelayed-branch
     -fdse
     -fforward-propagate
     -fguess-branch-probability
     -fif-conversion
     -fif-conversion2
     -finline-functions-called-once
     -fipa-profile
     -fipa-pure-const
     -fipa-reference
     -fipa-reference-addressable
     -fmerge-constants
     -fmove-loop-invariants
     -fomit-frame-pointer
     -freorder-blocks
     -fshrink-wrap
     -fshrink-wrap-separate
     -fsplit-wide-types
     -fssa-backprop
     -fssa-phiopt
     -ftree-bit-ccp
     -ftree-ccp
     -ftree-ch
     -ftree-coalesce-vars
     -ftree-copy-prop
     -ftree-dce
     -ftree-dominator-opts
     -ftree-dse
     -ftree-forwprop
     -ftree-fre
     -ftree-phiprop
     -ftree-pta
     -ftree-scev-cprop
     -ftree-sink
     -ftree-slsr
     -ftree-sra
     -ftree-ter
     -funit-at-a-time
     -falign-functions  -falign-jumps
     -falign-labels  -falign-loops
     -fcaller-saves
     -fcode-hoisting
     -fcrossjumping
     -fcse-follow-jumps  -fcse-skip-blocks
     -fdelete-null-pointer-checks
     -fdevirtualize  -fdevirtualize-speculatively
     -fexpensive-optimizations
     -fgcse  -fgcse-lm
     -fhoist-adjacent-loads
     -finline-functions
     -finline-small-functions
     -findirect-inlining
     -fipa-bit-cp  -fipa-cp  -fipa-icf
     -fipa-ra  -fipa-sra  -fipa-vrp
     -fisolate-erroneous-paths-dereference
     -flra-remat
     -foptimize-sibling-calls
     -foptimize-strlen
     -fpartial-inlining
     -fpeephole2
     -freorder-blocks-algorithm=stc
     -freorder-blocks-and-partition  -freorder-functions
     -frerun-cse-after-loop
     -fschedule-insns  -fschedule-insns2
     -fsched-interblock  -fsched-spec
     -fstore-merging
     -fstrict-aliasing
     -fthread-jumps
     -ftree-builtin-call-dce
     -ftree-pre
     -ftree-switch-conversion  -ftree-tail-merge
     -ftree-vrp
     -fgcse-after-reload
     -fipa-cp-clone
     -floop-interchange
     -floop-unroll-and-jam
     -fpeel-loops
     -fpredictive-commoning
     -fsplit-paths
     -ftree-loop-distribute-patterns
     -ftree-loop-distribution
     -ftree-loop-vectorize
     -ftree-partial-pre
     -ftree-slp-vectorize
     -funswitch-loops
     -fvect-cost-model
     -fversion-loops-for-strides
    
  • gfortran linux/windows
     -ftree-vectorize        # perform vectorization on trees. enables -ftree-loop-vectorize and -ftree-slp-vectorize.
     -funroll-loops          # [=n] set the maximum number of times to unroll loops (no number n means automatic).
     -O3                     # set the optimizations level
     -finline-functions      # consider all functions for inlining, even if they are not declared inline.
     #-fwhole-program         # allow the compiler to make assumptions on the visibility of the symbols leading to more aggressive optimization decisions.
     -flto=3                 # enable interprocedural optimization between files in parallel on 3 processors.
    

I don’t remember the specific reasons, but I ended up separating the GNU release flags for macOS from Linux/Windows because a particular unknown flag, switched on by -O3 (or -flto), led to segfaults on MacOS. This happened several years ago with GNU 7/8/9(?). The bugs may have been resolved in the newer releases of GNU compilers.
These flags are taken from the CMake files of the ParaMonte library. These flags do not include architecture-specific optimization flags that Steve Kargl listed (to ensure portability of the generated library).
Perhaps, a compilation of the flags similar to the above should appear on the FortranLang website, if there is not any there already.

Intel has an excellent summary of its compiler flags.

One final note, enabling interprocedural optimizations (e.g., with -flto or -ipo) will significantly lengthen the compilation process possibly extending a 30-sec process to 30 mins.

I mentioned only gfortran and ifort because I have experience primarily with these two.

8 Likes