Fortran Best Practice Minibook

We just added a new minibook to the webpage’s learn category: Fortran Best Practice.

This book was originally part of fortran90.org and has been almost completely reworked for addition to the webpage. Many thanks go to Vincent @vmagnin for translating the reStructuredText files to markdown and of course all the reviewers that helped getting the minibook into shape.

There is still work to be done, I already opened a few follow-up issues (contributions are welcome):

Please let us know what you think and feel free to leave comments here or at the issue tracker of the webpage.

21 Likes

Some practices I suggest are
(1) Don’t use SAVed variables unless necessary.
(2) Use the explicit SAVE attribute when such variables are necessary. Especially programmers coming from other languages will thank you.
(3) If a variable will not change during a procedure, declare it as a PARAMETER
(4) Declare procedures PURE or ELEMENTAL if possible

I recently looked at a subroutine of Alan Miller

SUBROUTINE f1max(ndf, nf, f1, f5, f10, ier)
! Calculates approximations to the 1%, 5% & 10% points of the distribution
! of the maximum of NF values of an F-ratio with 1 d.f. for the numerator
! and NDF d.f. for the denominator.
IMPLICIT NONE
INTEGER, INTENT(IN)  :: ndf, nf
REAL, INTENT(OUT)    :: f1, f5, f10
INTEGER, INTENT(OUT) :: ier

! Local variables
REAL :: a1(6) = (/ 1.67649, 6.94330,  1.22627, 0.25319,  0.06136, -2.41097 /)
REAL :: a5(6) = (/ 1.28152, 4.93268, -0.29583, 0.28518, -0.23566, -1.60581 /)
REAL :: a10(6) = (/1.06642, 3.96276, -0.62483, 0.30228, -0.52843, -1.04499 /)

REAL :: log_nf, one = 1.0

IF (ndf < 4 .OR. nf < 1) THEN
  ier = 1
  RETURN
END IF
ier = 0

log_nf = LOG(DBLE(nf))

f1 = one + EXP(a1(1) + (a1(3)/ndf + a1(2))/ndf + a1(4)*log_nf          &
               + (a1(6)/ndf + a1(5))/nf)

f5 = one + EXP(a5(1) + (a5(3)/ndf + a5(2))/ndf + a5(4)*log_nf          &
               + (a5(6)/ndf + a5(5))/nf)

f10 = one + EXP(a10(1) + (a10(3)/ndf + a10(2))/ndf + a10(4)*log_nf     &
                + (a10(6)/ndf + a10(5))/nf)

RETURN
END SUBROUTINE f1max

Incorporating the guidelines above, I think it’s better to write it like this:

elemental subroutine f1max(ndf, nf, f1, f5, f10, ier)
! Calculates approximations to the 1%, 5% & 10% points of the distribution
! of the maximum of NF values of an F-ratio with 1 d.f. for the numerator
! and NDF d.f. for the denominator.
implicit none
integer      , intent(in)  :: ndf, nf
real(kind=dp), intent(out) :: f1, f5, f10
integer      , intent(out) :: ier
! Local variables
real(kind=dp), parameter ::  a1(6) =  [ 1.67649, 6.94330,  1.22627, 0.25319,  0.06136, -2.41097], &
                             a5(6) =  [ 1.28152, 4.93268, -0.29583, 0.28518, -0.23566, -1.60581], &
                            a10(6) =  [ 1.06642, 3.96276, -0.62483, 0.30228, -0.52843, -1.04499], &
                            one = 1.0_dp
real(kind=dp) :: log_nf
if (ndf < 4 .or. nf < 1) then
  ier = 1
  return
end if
ier = 0
log_nf = log(real(nf,kind=dp))
f1  = one + exp(a1(1) + (a1(3)/ndf + a1(2))/ndf + a1(4)*log_nf + (a1(6)/ndf + a1(5))/nf)
f5  = one + exp(a5(1) + (a5(3)/ndf + a5(2))/ndf + a5(4)*log_nf + (a5(6)/ndf + a5(5))/nf)
f10 = one + exp(a10(1) + (a10(3)/ndf + a10(2))/ndf + a10(4)*log_nf + (a10(6)/ndf + a10(5))/nf)
end subroutine f1max

The original subroutine could not be declared ELEMENTAL because of the implicitly SAVE’d variables.

5 Likes

Just for my own information, what’s wrong with using the pattern

subroutine whatever(n,A)
    integer :: n
    real :: A(n,n)

as opposed to using assumed shape arrays? A lot of the code I write has to be C interoperable, and I find the above form more convenient than using the CFI_ function set from ISO_Fortran_binding.h. I also like how if you have several array dummy arguments with related shapes, that form makes it explicit how the shapes are supposed to relate to each other.

I definitely agree that in many cases it’s more convenient to use assumed-shape, but I’m curious if I’m missing something about the “old-school” approach that makes it categorically worse?

4 Likes

@hsnyder I don’t think there’s anything wrong with it. Using assumed shape arrays allows for a bit more concise (and arguably, less error-prone code), but you seem to have good reasons to prefer this approach, and that is fine.

I haven’t yet read what the Best Practices Minibook says about this, but perhaps we can improve it by listing the pros and cons of each apprpach.

3 Likes

I don’t have enough experience passing arrays to C to know which I would prefer, but I suspect you are correct that the CFI stuff is not convenient enough. The reason I don’t like explicit shape is that most people (myself included on occasion) at first glance expect the compiler to check that the actual argument has the right shape, but it doesn’t. It leads to hidden bugs more often than I would like.

1 Like

In Fortran Best Practices the Arrays section has an example that includes

do i = 1, n
r(i) = 1.0_dp / i**2
end do

If such things are written as one-liners such as

r(1:n) = [( 1.0_dp / i**2, i = 1,n )]

then a whole scoping unit is more likely to be all visible on one screen, making it easier to understand and debug. Is there a disadvantage in using such array constructors?

3 Likes

Nothing wrong with that at all. I find array statements preferable to loops for multiple reasons.

  1. In theory, it should be easier for a compiler to optimize (although more time and effort has been spent detecting the ability to optimize the former, since it’s been around longer).
  2. I also find it easier to read later. A loop could be used to do all kinds of things, and thus I have to read the whole the loop to know what it’s for. A single statement must only do one thing.
  3. It is less likely to accumulate cruft over time. I’ve seen it happen, where developers will take the easy route and add more statements into a loop rather than write a new one or devise the equivalent single statement version of whatever they are adding. Over time, the loop accumulates enough things to make it really hard to understand.

There are some who disagree with me, but IMO prefer array statements to explicit loops.

1 Like

This is the same question I asked 10 years ago. Here are the two relevant parts:

I use the assumed-shape in my codes following this recommendation. However, I wonder if explicit-shape is not better after all:

  • It specifies the exact dimensions of all arrays (in and out) at compile time using runtime variables n, m, l, etc.
  • This helps the reader (me) to see what the dimensions are right away.
  • It eliminates the need to manually check that a square matrix was passed into a subroutine that expects A(:,:) (and give an error message otherwise), since one can declare it as X(n,n), and then the compiler itself can simply give a runtime error if an incompatible array is passed in.
  • I also wonder if this might allow some potential compiler optimizations.
  • And it definitely would allow compiler checks for things like matmul (as well as all my functions that use explicit-shape). You do not get any such compile time checks if you pass arrays to matmul with incompatible dimensions, because the compiler does not know until runtime.

The only downside is that an array temporary is created if you pass in non-continguous data. However, in practice I always end up with contiguous arrays. Except when prototyping, but in that case I don’t mind the array temporary.

I wonder if explicit-shape and assumed-shape arrays cannot simply be united:

  • Potentially allow non-contiguous implementation of explicit-shape (thus no array temporaries), unless bind(c), or if interfacing legacy Fortran such as the Lapack library (via interface, possibly one could require the contiguous keyword for that)
  • Allow to allocate explicit-shape

The last point would be very helpful, consider this code:

integer :: n
real(dp), allocatable :: X(n,n), Y(n,n)
...
integer :: m
read (*,*) m
allocate(X(m,m), Y(m,m))

This would check at runtime (with bounds checking enabled) that the array dimensions are conforming, i.e., that X and Y have exactly the same dimensions, and are square matrices.

One issue is what to do with the variable n, such as what happens if one assigns to it. One possible solution:

real(dp), allocatable :: X(size(X,1),size(X,1)), Y(size(X,1),size(X,1))

However, it is a lot more readable if the variable n is used. Another solution is:

integer, dim :: n
real(dp), allocatable :: X(n,n), Y(n,n)

which can also be used in subroutines as:

subroutine sub(X, Y)
integer, dim :: n = size(X,1)
real(dp), intent(in) :: X(n,n)
real(dp), intent(out) :: Y(n,n)
! No need for these checks:
! if (size(X,1) /= size(X,2)) error stop
! if (size(Y,1) /= size(Y,2)) error stop
! if (size(X,1) /= size(Y,1)) error stop
end subroutine

Also, not everything can be checked even in matmul at compile time, such as when the two arrays declared as:

integer :: n, m
real(dp), allocatable :: X(n,n), Y(m,m)
Z = matmul(X, Y)

But I think a lot more things could be checked. And even in this case, the compiler now knows the runtime constrain n == m, and so can use it for example to give an error message much earlier (say when the n and m is read from a file and it happens that n /= m).

2 Likes

If the user passes an incorrect array size to a procedure, it can return incorrect results. Here is an example:

module m_mod
implicit none
integer, parameter :: dp = kind(1.0d0)
contains
function mean(n,x) result(y)
integer, intent(in) :: n
real   , intent(in) :: x(n)
real                :: y
y = sum(x)/n
end function mean
!
function size_vec(n,x) result(nsize)
integer, intent(in) :: n
real                :: x(n)
integer             :: nsize
nsize = size(x)
end function size_vec
!
end module m_mod

program main
use m_mod, only: mean,size_vec
implicit none
real :: x(2)
x = [10,20]
print*,mean(2,x)
print*,size_vec(2,x)
! wrong size passed below
print*,mean(3,x) 
print*,size_vec(3,x)
end program main

For this program, gfortran and Intel Fortran give output:

   15.0000000    
           2
   10.0000000    
           3

so they silently pad a 2-element array with an extra element of value zero. G95 helpfully gives output

 15.
 2
 NaN
 3
1 Like

I get:

$ gfortran -Wall -Wextra -Wimplicit-interface -fPIC -fmax-errors=1 -g -fcheck=all -fbacktrace a.f90
$ ./a.out
   15.0000000    
           2
  -2.51098233E+23
           3

I would expect a runtime bounds check error.

This is not true. If you pass a matrix that is not square, and give the larger dimension, you may get a run-time error if bounds checking is turned on (note that a standards conforming compiler is not required to do bounds checking for you), but if you give the smaller dimension, you will not get any bounds checking error, just erroneous results. In fact, at that point you can’t check that the shape of the actual argument was correct.

There is no constraint or requirement for checking that the shape of an actual argument matches the explicit shape of a dummy. Explicit shape dummy arguments are a giant hole waiting to let bugs in, and I’ve seen lots of them creep in that way. Please do not use explicit shape dummy arguments in new code. It implies something to the reader that the language does not actually promise.

3 Likes

The compiler (with bounds checking enabled) can check at the caller’s site that all the dimensions are actually correct. My understanding is that there is nothing preventing the compiler to check all usage.

1 Like

I also like the one-liner form and use it small things (e.g. testing codes). But I wonder if it might be slower than an explicit loop when n is large (if the right-hand side is evaluated using a temporary array). I’ve never compared the performance, so not sure at all though…

Btw, I am a bit worried about i**2, because it can integer-overflow rather easily for relatively small i (I remember having such an error pattern much ago). So I guess it might be safer (?) as a tutorial just to use 1.0_dp / i etc (so that a beginner will not get used to i**2).

program main
    implicit none
    integer, parameter :: dp = kind(0.0d0)
    integer :: i

    do i = 10000, 100000, 10000
        print *, i, 1.0_dp / i**2, 1.0_dp / real(i, dp)**2
    enddo
end

$ gfortran-10 test.f90 && ./a.out
       10000   1.0000000000000000E-008   1.0000000000000000E-008
       20000   2.5000000000000001E-009   2.5000000000000001E-009
       30000   1.1111111111111111E-009   1.1111111111111111E-009
       40000   6.2500000000000001E-010   6.2500000000000001E-010
       50000  -5.5711321439028600E-010   4.0000000000000001E-010
       60000  -1.4389166306899138E-009   2.7777777777777777E-010
       70000   1.6528032177249051E-009   2.0408163265306123E-010
       80000   4.7505200185241395E-010   1.5625000000000000E-010
       90000  -2.0410887827246949E-009   1.2345679012345679E-010
      100000   7.0918695992859933E-010   1.0000000000000000E-010

Key word there is can. I could accept that as an argument if any of them actually did, even for some of the more common error checking options. I’ve been turning on more and more error checking flags in a code modernization effort I’ve been working on for almost a year now, and it wasn’t until recently when I turned on -C=undefined with nagfor that I actually started to find places where the shapes of actual arguments didn’t match the dummy arguments.

Thus I conclude that only the most extensive and rarely used options will catch these types of errors for the majority of compilers, and that people have likely been introducing these kinds of bugs without realizing for a long time. That’s why I say just don’t use explicit shape arguments. It’s the kind of mistake that’s just too easy to make and that most people will never catch. It’s just too big of a foot-gun with too sensitive a trigger. It’s just not safe.

2 Likes

I agree. That is why there is can in the original quote to which you replied that it is not true. I highlighted it here:

So the way it is written, I think it is actually true. But even if I forgot to put the word “can” in it, all I am asking is for readers to try to see the intent of my (or anybody’s) posts.

The reason I am posting this is that I would like LFortran to simply check this when bounds checking is enabled. I still don’t know if there is some issue that prevents this.

However, a practical solution is to simply enable it with an option, and allow to disable it in any possible rare case when this would break some (technically standard conforming) code.

I don’t use explicit-shape much, as I said. But I think they might have a huge unrealized potential and actually allow the compiler to catch all kinds of bugs.

1 Like

A small typo in the “Arrays” section:

Finally, there are assumed-size arrays, which provide the least compile-time and run-time checking and can be found be found frequently in legacy code.

“be found” is repteated.

@hsnyder , so what is your prototype in C for the Fortran whatever subroutine you list above?

@certik @everythingfunctional what’s misleading here is the bounds checking. This specific example doesn’t have an out-of-bounds state, but an undefined one, which Brad mentioned. I agree that compilers should be able to report dummy-actual argument shape mismatch at runtime, at least as an opt-in. I don’t see why not.

@FortranFan it would be something like

void whatever(int * n, float * A)

though obviously the subroutine needs to be marked bind(c), and I would likely make n have the value attribute.

An equivalent using ISO_Fortran_binding.h isn’t too bad in this simple case, but when you have a lot of functions with a lot of arguments it can get tiresome.

It was common in Fortran 77 to pass a single array element, effectively a pointer, to a procedure expecting an array argument, which could be considered “dummy-actual argument shape mismatch”, for example

function sum_vec(n,ivec) result(isum)
integer, intent(in) :: n,ivec(n)
integer             :: isum
isum = sum(ivec)
end function sum_vec

program main
integer :: ivec(3) = [10,20,30]
integer :: sum_vec
print*,sum_vec(2,ivec(2))
end program main

for which all compilers give output 50.

1 Like