Fortran syntax for pragmas

In C/C++, the standard syntax for pragmas is:

#pragma ...

In Fortran, what is the most common syntax for this? The wikipedia article on pragmas (directives) does not mention Fortran: Directive (programming) - Wikipedia

The only relevant discussion at fortran-proposals github that I was able to find: Option to have interoperable types packed · Issue #256 · j3-fortran/fortran_proposals · GitHub

Here are the styles of pragmas I was able to find on the internet:

It seems the established syntax is to use a comment (C in fixed-form, or ! in free-form) followed by $. And sometimes there is a compiler vendor/name in between.

In particular, what would be a good way to add new attributes to variable declaration, that are language extensions? Let’s say you wanted to add a simd extension to LFortran (SIMD backend · Issue #2293 · lfortran/lfortran · GitHub). The obvious choice is:

real(sp), simd :: x(8)

And I think this is the way to do it, if you are ok to only use LFortran. However, it is often useful to be able to compile the same code with multiple compilers. So in addition to the above, we should also support pragma directives, and users have a choice. Given my research above, what would be the most natural way?

Here is the best idea I have so far, in order to be as compatible as possible with established usage:

!LF$ attributes simd :: x
real(sp) :: x(8)

Here LF stands for LFortran. And the syntax otherwise is like for GFortran or Intel Fortran for adding custom attributes. If later let’s say GFortran supports it as well, we could do:

!GCC$ attributes simd :: x
!LF$ attributes simd :: x
real(sp) :: x(8)

What syntax would you recommend?

@certik,

I personally recommend taking a two-pronged approach:

  1. Immediately proceed with syntax as close to GCC as viable but decorated using !LF$ as you have already recognized,

  2. However, work closely with the J3 committee (perhaps you and/or rep can join as LFortran org on the committee?) and influence the worklist for Fortran 202Y toward US10. Define a standard Fortran preprocessor with syntax and semantics to cover also the aspects of interest to LFortran in a manner that will be both elegant and workable within the LFortran framework.

1 Like

@certik Your idea makes sense. I would stick to it.

1 Like

In fortran these are typically called compiler directives, and they date back to before PRAGMA was introduced in C (which I think was in 1999). In the 1980s, when all the RISC workstations were popular, each with their own fortran compilers, they all used different syntax for compiler directives. In some ways that was a good choice, in other ways it meant that a lot of (near-) duplication was required. If I scan through some of my legacy code from that time, here are some of the directives:

*vocl loop,repeat(maxvl)
*vocl loop,novrec
cdir$       ivdep
c$doit ivdep
cvd$   permutation (indexx)

These were all fixed-source f77 code, so the *, c, or C had to appear in column 1. This was before inline comments (!) were added in f90.

I have read that there were also compiler directives in early forms of fortran in the late 1950s and early 1960s. These directives would tell the compiler which branch of an arithmetic if statement was most likely to occur or what was a typical value for a do loop range.

1 Like

The language should be expressive enough to achieve the goals. If not, then the language syntax should be fixed instead of relying on non-portable compiler-specific directives. That’s an area where Fortran has traditionally shined. If directives are essential, they should be standardized and added to the language. That’s my perspective. I reemphasize FortranFan’s suggestion “… Define a standard Fortran preprocessor…”. The standard committee’s counterarguments are fair and robust in that FPP may promote bad coding habits. But ignoring or failing to respond timely to a critical demand is even worse than having an inferior solution.

1 Like

The simd annotation that I am working on right now is a middle step, they are fundamentally platform dependent. Ultimately the Fortran compiler must be able to compile regular high level loops into these low level simd-annotated loops. But to get there, I am starting with hand-written platform-dependent simd annotations. So it’s unclear if it has to be added to the language, but I need some mechanism to control the compiler from the source code.

1 Like

If I remember correctly Cray used CDIR$ (and I guess !DIR$ later) for their compiler directives (ie things like CDIR$ IVDEP to control vectorization at the loop level). If the focus is compiler directives, maybe just standardizing on !DIR$ or maybe !LFDIR$ is the path of least resistance.

1 Like

SIMD annotation is one of the areas were the OpenMP standard has came through:

!$omp simd [clause[ [,] clause ... ] 
   do-loops 
[!$omp end simd]

In the past, one would typically use the VECTOR and IVDEP directives to assist the compiler in generating vector instructions:

! gfortran
!GCC$ ivdep
!GCC$ vector
!GCC$ novector
! Intel Fortran compiler
!DIR$ IVDEP [: option]
!DIR$ VECTOR [clause[[,] clause]...]
!DIR$ NOVECTOR

I’m guessing your attributes simd :: x would be a means of aligning memory for optimal SIMD access. With the Intel compiler it is done this way:

real, allocatable :: a(:), b(:)
real              :: c(1000)
!dir$ attributes align:64 :: a
!dir$ attributes align:64 :: b
!dir$ attributes align:64 :: c

which would then allow using the !DIR$ vector aligned directive. However it appears Intel is deprecating the SIMD- and vectorization-related directives in favor of OpenMP SIMD. The directives will be supported, but OpenMP is the recommended way going forward.

From the documentation of gfortran ivdep directive:

For new code it is recommended to consider OpenMP SIMD directives as potential alternative.

If you are interested in what directives compilers support:


Most compilers support loop transformation directives. To unroll a loop by 4 iterations, one would write:

! gfortran
!GCC$ unroll 4

! Intel Fortran
!DIR$ UNROLL= 4

! XL Fortran (IBM)
!IBM* UNROLL(4)

! Cray Fortran
!DIR$ UNROLL 4

! Solaris Studio (Oracle)
!$PRAGMA SUN UNROLL=4

Five compilers, each with it’s own directive syntax… To make things worse, the directives can have slightly differing semantics. The directive !DIR$ is shared and often incompatible between compilers. For example to write the correct unroll directive with both Intel and Cray, one needs to use a preprocessor fence:

#if defined(__IFORT__)
!DIR$ UNROLL=4
#elif defined(__CRAY_FTN__)
!DIR$ UNROLL 4
#endif

The preprocessor and directive code quickly grows beyond control…

For these reasons, the OpenMP comittee has decided to standardize loop transformation directives. In OpenMP 5.1, the following new syntax is supported:

!$OMP UNROLL [clause]
   loop-nest
[!$OMP END UNROLL]

where clause could be either FULL or PARTIAL[(unroll-factor)]

So my two cents would be,

  • stick to OpenMP when the directives already exist.
  • if you must introduce your own directives, use the !LF$ prefix and not the generic !DIR$ one.
2 Likes

What about portable compiler-agnostic directives? Quoting an article by Michael Kruse,

Directives for the compiler such as pragmas can
help programmers to separate an algorithm’s semantics from
its optimization. This keeps the code understandable and easier
to optimize for different platforms.

Most programs spend the majority of their time in loops. On cache-based architectures at least, one often needs to optimize for cache locality by means of blocking, tiling loops, and other loop transformation tricks…

For example a vanilla double loop (say in matrix multiplication, or a stencil),

do i = 1, 128
  do j = 1, 128
     ! ...
  end do
end do

will in some cases fail to achieve optimal performance.

Instead one has to tile the loop for better cache locality:

do i1 = 1, 128, 8
  do j1 = 1, 128, 8
     do i2 = i1, i1 + 8
       do j2 = j1, j1 + 8
          ! ...
       end do
     end do
  end do
end do

However this makes the code obscure and the nature of the algorithm is not immediately recognizable.
OpenMP 5.1 attempts to solve this by means of the !$omp tile directive

!$omp tile sizes(8,8) 
do i = 1, 128
  do j = 1, 128
     ! ...
  end do
end do

The algorithm remains clear and succinct, just like in the original version. If you need to change the tile size when moving to a different CPU architecture, only one line has to be changed instead of four.

Here a few resources on the topic:

2 Likes

Excellent option. I reemphasize the portable part of it. That, in my opinion, should include identical decorations. Effectively implicitly standardized.

1 Like

As an example, the fact that Intel compilers have followed the GNU compilers conventions or at least made an effort to be as compatible as possible with it over the years has been a tremendous help to developers. I recall the Intel compiler developers joking about their efforts to reproduce GNU bugs identically in the Intel compilers for consistency and portability.

One of the earlier directives as far as I’m aware was, CDIR$ IVDEP (ignore vector dependencies). It is described in the manual “Vectorization and Conversion of Fortran Programs for the CRAY-1 (CFT) Compiler” (meaning it dates back to the 1976-1979 period). According to Wikipedia, this was the first auto-vectorizing compiler.

I first learned about the history of IVDEP in a talk by John M. Levesque (a former Cray employee and author of the book “A Guide to Fortran on Supercomputers”). There is also a lecture from James Reinders (of Intel) - Vectorization (SIMD) and Scaling - which talks of ivdep and OpenMP SIMD (already part of OpenMP 4.0). Here is a screenshot from the video, referring to the CFT compiler:

Recently I read that only 10% of C++ programs manage to exploit vectorization (and this sounds like an optimistic estimate to me; I will try to find the source). Since auto-vectorization still fails sometimes, Reinders’ attitude is it remains the responsibility of the programmers to assert themselves by adding the omp simd directive when beneficial.

In OpenMP the ivdep directive is replaced by the safelen clause,

!$omp simd safelen(4)
do i = 5, n
  a(i) = a(i-4) + b(i)
end do

Which was a jewel…

Isn’t simd itself the equivalent of ivdep ?

My own understanding was that !$omp simd is similar to !dir$ vector in that it prescribes vectorization instead of leaving it to the compiler auto-vectorizer. ivdep instructs to ignore dependencies, allowing the auto-vectorizer to kick in. But I could be wrong here.

Edit: As a corollary, the equivalent to !dir$ novector is !$omp simd if(simd: .false.) according to this thread: vectorization - OpenMP pragma with a meaning: don't vectorize - Stack Overflow, with the answer coming from a reputable source

!DIR$ as a directive introducer seems to be supported widely, but of course, as others note, directives are not part of the standard. While the standard COULD specify what the introducer should be, it can’t plausibly specify what comes after that, so I am uncertain how beneficial this would be. We certainly don’t want to end up in this situation: xkcd: Standards

image

2 Likes

As for IVDEP, it does not mean “vectorize”. It stands for “Ignore Vector DEPendencies” and provides information that could allow vectorization but doesn’t require it. Sort of like DO CONCURRENT.

Isn’t it the same with !$OMP SIMD ? A hint to the compiler: “you can safely vectorize this one if you like”…

Perhaps - I just know there was a lot of user confusion over IVDEP in the past.

There is a collection of benchmarks for vectorization known as the Livermore loops. They are available from netlib.

The first kernel is

c*******************************************************************************
c***  KERNEL 1      HYDRO FRAGMENT
c*******************************************************************************
c
cdir$ ivdep
 1001    DO 1 k = 1,n
    1       X(k)= Q + Y(k) * (R * ZX(k+10) + T * ZX(k+11))

I put this into godbolt as follows:

subroutine kernel1(n,q,r,t,y,zx,x)
    integer, intent(in) :: n
    real, intent(in) :: q,r,t
    real, intent(in) :: y(n), zx(n)
    real, intent(out) :: x(n)

#if defined(IVDEP)
!gcc$ ivdep
#elif defined(VECTOR)
!gcc$ vector
#elif defined(SIMD)
!$omp simd
#endif
do k = 1,n
   X(k)= Q + Y(k) * (R * ZX(k+10) + T * ZX(k+11))
end do

end subroutine

I checked for presence of packed vector instructions, such as vmulps (vector multiply packed single) and using the -fopt-info flag to check if vectorization was succesful.

The results I got with gfortran v13.2:

(Flags: -c -cpp -O3 -march=skylake -fopenmp-simd -fopt-info-vec -DSIMD)

Directive -O1 -O2 -O3
None :x: :x: :white_check_mark:
ivdep :x: :x: :white_check_mark:
vector :white_check_mark: :white_check_mark: :white_check_mark:
omp simd :white_check_mark: :white_check_mark: :white_check_mark:

And the results with ifort v2021.10 (replacing !GCC$ with !DIR$):

(Flags: -c -fpp -O3 -xskylake -qopenmp-simd -qopt-report-phase=vec -qopt-report-file:stdout -DSIMD)

Directive -O1 -O2 -O3
None :x: :white_check_mark: :white_check_mark:
ivdep :x: :white_check_mark: :white_check_mark:
vector :x: :white_check_mark: :white_check_mark:
omp simd :x: :white_check_mark: :white_check_mark:

It appears that with ifort vector optimizations are disabled at level -O1.

Finally, the results for ifx v2023.2.1:

(Flags: -c -fpp -O3 -xskylake -qopenmp-simd -qopt-report-phase=vec -qopt-report-file:stdout -DSIMD)

Directive -O1 -O2 -O3
None :x: :white_check_mark: :white_check_mark:
ivdep :x: :white_check_mark: :white_check_mark:
vector :x: :white_check_mark: :white_check_mark:
omp simd :white_check_mark: :white_check_mark: :white_check_mark:

Edit: one caveat - when vectorization was succesful, I didn’t check if the generated instructions with or without directives were the same.

2 Likes

!$omp simd appears to be more than just a hint. At least in gfortran and ifx it appears to work starting from level -O1.

I believe that in ifx the ifort directives are not fully implemented yet. But in this particular loop the auto-vectorizer does the job already.

I think all of these directives are meant to be used as a performance tuning tool. After finding the hotspots of your code, you look at the results of -fopt-info-missed (gfortran) or -qopt-report (ifort). After finding the loops which failed to vectorize, either because the compiler couldn’t perform a full analysis or the heuristics made it believe vectorization was not profitable, you go in and add the directives and verify the the optimization worked. Ideally with a measurement on a representative workload.