Fortran Best Practice Minibook

@themos okay, great, thanks a lot! I don’t won’t to get into how different compilers name things, but I think this may get at the crux of my confusion:

That surprised me. Do you know which requirement in the standard does it violate?

Edit: The only way I can tell it’s not conforming is the reference of an undefined array element. I’m curious if it’s not conforming for any other reason.

What I meant is even with C99 and variable-length arrays, there isn’t explicit-shape enforcement at compile-time or run-time with most compilers and if I recall correctly with that standard either. And

void whatever( int n, float A[n][n] );

can be seen as equivalent to

void whatever( int, float (*A)[] );

which is more like the assumed-size array facility in Fortran rather than explicit-shape.

(by the way, array sizes are preferably declared using size_t in C than int)

1 Like

Correct. Its only utility is to help communicate how to use the API to whomever is using it. And, much like explicit shape arrays, it could potentially be deceiving in that the reader might assume it offers some safety.

9.5.3.1 “Syntax” para 2

  1. The value of a subscript in an array element shall be within the bounds for its dimension.

All the array elements are defined, how can there be an undefined value?

But shape(a) there is [2, 2]. :slight_smile: I don’t see how that requirement is violated.

All the array elements are defined, how can there be an undefined value?

Does a(2,2) have a defined value when you call s(x(8:),2)? What is its value?

Not rejecting old code. Checking old code. The code you posted can be fully checked at the call site:

allocate(A(5,5), B(5,5))
...
call sub(4, A, B)

The compiler knows the sub’s declaration:

subroutine sub(n, X, Y)
integer, intent(in) :: n
real, intent(in) :: X(n,n)
real, intent(out) :: Y(n,n)

And at runtime it knows (or could know) that the size of A is 5, but the subroutine sub only expects 4 due to n being 4.

Regarding @themos’s example, I believe that can also be checked as follows: the subroutine signature is:

  subroutine s(a,n)
    integer, intent(in)::n
    real :: a(n,n)

So when calling it as call s(x(3), 2), the compiler knows the subroutine accepts an a(2,2) array (in this case), so the call is equivalent to call s(reshape(x(3:6), [2, 2]), 2), and then it can be properly bounds checked at runtime. I.e. if x(3:6) is out of bounds at runtime, then an error should happen.

The same with call s(x(8:),2), which would be equivalent to call s(reshape(x(8:11), [2,2]), 2), and since x is only allocated to 10, it gives a bounds check right away.

What if s had an intent(inout) argument? Then one can’t use reshape, but I believe one can use a pointer association and the similar bounds checking argument applies.

(The compiler can, but does not need to internally rewrite it like I’ve shown above; It can just insert the appropriate runtime bounds checks “as if” the code was written that way.)

This is not some theoretical exercise — when a function returns an array, I’ve seen all kinds of segfaults if the bounds are incorrect and it does not always get caught either. But the same algorithm applies and I think it can all be checked.

I think that would be fine as long as the compiler uses this information to output a meaningful warning message, for example:

Warning: Shape mismatch between actual argument x(8,:) and dummy argument x(n,n) in call to subroutine sub on line ...

or similar. But if it printed:

Error: Subscript 11 is out of bounds for array x(8:11) on line...

I would be very confused as a user and file a bug report with the compiler :).

1 Like

While I was unaware of the absence of runtime bounds checking with explicit shape arrays and agree it gives a false sense of guarantees, I believe the correct way to use explicit-shape arrays is using variables/parameters and not “magic constants”, i.e.

subroutine sub(n, X, Y)
integer, intent(in) :: n  
  ! Document any requirements on shape n
real, intent(in) :: X(n,n)
real, intent(out) :: Y(n,n)
...
end subroutine

integer, parameter :: sz  = 5  
  ! Document why this size was needed

allocate(A(sz,sz), B(sz,sz))
...
call sub(sz, A, B)

Another example is a subroutine to process a point-cloud. The point coordinates could be stored either as a 3×N array (x1,y1,z1,…,xN,yN,zN) or an N×3 array (x1…xN,y1…yN,z1…zN).

If my subroutine has the interface

subroutine process_points(xyz)
real, intent(in) :: xyz(:,:)

then I need to read the documentation or more likely the routine itself (since undocumented) to figure which storage order is used.

On the other hand the interface

subroutine process_points(n,xyz)
  integer, intent(in) :: n
  real, intent(in) :: xyz(3,n)

communicates immediately that the routine expects a 3×N array and I can allocate the storage accordingly.

1 Like

I have a better answer, and closer to the way the check is implemented in nagfor, by requesting the -C=calls option. The ultimate cause is the same but the runtime error happens at the CALL statement. Thanks for making me look deeper (I have a sense of deja vu).

15.5.2.11 Sequence association, para 5

5 An actual argument that represents an element sequence and corresponds to a dummy argument that is an array is sequence associated with the dummy argument. The rank and shape of the actual argument need not agree with the rank and shape of the dummy argument, but the number of elements in the dummy argument shall not exceed the number of elements in the element sequence of the actual argument. If the dummy argument is assumed-size, the number of elements in the dummy argument is exactly the number of elements in the element sequence.

4 Likes

That’s it, perfect, thanks a lot. I didn’t know about this rule. Even more reason for compilers to catch this and throw a run-time error, at least as an opt-in.

3 Likes

That’s exactly right! This is very common in my codes also, and indeed the explicit-shape arrays communicate the intent much better to the programmer. I currently use:

subroutine process_points(xyz)
  real, intent(in) :: xyz(:,:)

which is unfortunate. Here is my actual code: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/md.f90#L13

interface
    subroutine forces_func(X, f)
    ! Calculate forces for particles at positions X
    import :: dp
    implicit none
    ! (i, j) is the i-th particle, j-component (j=1, 2, 3)
    real(dp), intent(in) :: X(:, :) ! positions
    real(dp), intent(out) :: f(:, :) ! forces
    end subroutine
end interface

Notice how I have to document it. If instead, I write it like this:

interface
    subroutine forces_func(n, X, f)
    ! Calculate forces for particles at positions X
    integer, intent(in) :: n
    real(dp), intent(in) :: X(n, 3) ! positions
    real(dp), intent(out) :: f(n, 3) ! forces
    end subroutine
end interface

or even like this with the above extension:

interface
    subroutine forces_func(X, f)
    ! Calculate forces for particles at positions X
    integer, dim :: n
    real(dp), intent(in) :: X(n, 3) ! positions
    real(dp), intent(out) :: f(n, 3) ! forces
    end subroutine
end interface

Then this is so much more readable.

Update 1: of course I swapped n and 3 when writing this, but now it’s fixed… proving the point I am trying to make.

Update 2: Originally I wrote integer, dim :: n = size(X,2), then realized my mistake and rewrote it to integer, dim :: n = size(X,1), and then realized that you don’t need this size initialization at all, you can just declare it as integer, dim :: n. It’s kind of like a symbolic dimension index.

Here is why:

  • In the assume-shape, we give up compile time expressions for the dimensions and make them runtime.
  • As a consequence, the array itself truly has to have a runtime array descriptor which contains information about the actual runtime dimensions
  • In explicit-shape, we retain the information about runtime dimension as a compile time expression
  • As a consequence, the array does not require a runtime descriptor

Relegating things to runtime almost always results in more general code, thus more opaque intent.

Essentially, yes, with assumed-shape you also get out of bounds checks, but much later inside possibly very nested set of routines that eventually something is out of bounds. With explicit-shape, you can possibly get this error message much earlier: sometimes even at compile time, and definitely at the call site.

So this corresponds to “strong concepts” for templates in C++. You get the error message early, instead of late. If you don’t have concepts, then you still get an error message, just a worse error message and much later. It’s the same principle.

The explicit-shape arrays are a truly nice feature of Fortran, quite unique too I believe. Most other languages just have assume-shape arrays, including NumPy, Kokkos in C++ (and Julia too I believe). Strengthening the compiler support around them would be cool.

Here is another advantage of explicit-shape:

  • Automatic Python wrappers generators (such as f2py) can possibly use this information to insert checks into the Python code to ensure the NumPy arrays have the correct dimensions before passing them to Fortran.
  • With assumed-shape, the f2py can’t do anything (currently this is not supported anyway I believe). Then inside the Fortran code you have to manually add checks (unfortunate), or you always have to run with bounds checking on (performance hit). But the problem is that Python is interactive, so users can pass in the wrong arrays by a mistake and they will get a segfault in Release build. With explicit-shape, the f2py tool can possibly ensure that no segfault can happen even in Release mode. Again, related to the early vs late bounds checking.

I apologize for these long posts. It’s just a recent realization for me, and the more I think about it, the more sense it makes. So thank you all for providing this platform for discussion. This has been very helpful to me.

3 Likes

Well, as already observed up-thread, you can add one more line

subroutine process_points(xyz)
  real, intent(in) :: xyz(3,:)
  integer :: n
  
  n = size(xyz,2)

and you get

  • the same level of clarity
  • no bound check problems
  • (most importantly, IMO) a simpler interface without the need to pass n.

Explicitly passing array dimensions feels more 90s than a Motorola_DynaTAC :grin:

On the other hand, I find more compelling the case of a generic interface as proposed by @certik, where you are supposed to scroll down to the implementation or check the documentation if there are no comments to help the user.

I agree that in the future explicit-shaped arrays can be considered the best option (and also give an edge to Fortran compared to other languages), but “here and now” I think that assumed-shaped arrays are “best practice”.

2 Likes

Can you post the whole code? I can’t get this to compile:

program A

contains

    subroutine forces_func3(X, f)
    ! Calculate forces for particles at positions X
    real, intent(in) :: X(3, :) ! positions
    real, intent(out) :: f(3, :) ! forces
    end subroutine

end

I get with GFortran 9.3.0:

$ gfortran b.f90
b.f90:7:30:

    7 |     real, intent(in) :: X(3, :) ! positions
      |                              1
Error: Bad array specification for an explicitly shaped array at (1)
b.f90:8:31:

    8 |     real, intent(out) :: f(3, :) ! forces
      |                               1
Error: Bad array specification for an explicitly shaped array at (1)

@epagone, as you can see from Ondrej’s example you cannot combine explicit- and assumed-shape dimensions. I suppose what you meant was:

subroutine process_points(xyz)
  real, intent(in) :: xyz(:,:)
  integer :: d, n
  
  d = size(xyz,1)
  n = size(xyz,2)

The reasons I don’t like this in practice:

  • It is not immediately obvious that d is supposed to be 3, meaning I need to introduce a runtime check and decide how to handle the error
  • The declaration section will usually contain several more variables (and also comments), pushing the size retrieval statements further down.

The nice thing about explicit-shape is that the semantic information about the array size requirements is given in the dummy variable declaration itself and not the comments or program statements.

The way I see explicit-shape arrays is as a form of design by contract, in the sense that it is the responsibility of the caller that invokes the subroutine to meet the preconditions specified.

On the other hand with assumed shape arrays the library developer is forced to do defensive programming, adding runtime checks which adds run-time and maintenance costs. To avoid the run-time checks you can play around with preprocessor statements (debug and release modes), which is again a whole new can of worms.

There are of course situations where explicit-shape arrays don’t cut it, for example in a subroutine which is designed to process both arrays (contiguous) or array slices (non-contiguous). Here assumed-shape is the way to go.

2 Likes

@certik, @ivanpribec good catch: in this specific case you need to declare one additional variable (d in the case shown)… but this happens to be exactly what the relevant “real world”, production code required regardless:

As often happens, there is no “one size fits all” solution.

To be fair, as I said in my previous message, the “real world” (i.e. not toy programs) is not that simple and if I had to provide “real world best practices”, I would start distinguishing between layers of private and public procedures. In the former, you are not really forced to insert runtime checks pocketing all the advantages with “assumed-shape” arrays. I wish there was a simple answer for these choices that is valid in every case.

1 Like

Such needs are custom and as such, custom solutions make sense.

Given the nature of the particular computational formulation of a (possibly chemical process) simulation here as indicated with 3D point coordinates, one can empathize with the need to work with explicit-shape arrays in compute instructions. But such a desire need not extend necessarily to dummy arguments that are explicit-shape arrays though.

A better practice toward such cases (in a world of processors that are all compliant with current Fortran standard) can be parameterized derived types (PDTs):

..
   type :: point_t(n)
       integer, len :: n
       real :: xyz(3,n)
       ..
   end type
   ..
subroutine process_points( points )
   ..
   type(point_t(n)), .. :: points  ! or class(point_t(n)) if type-bound i.e., pending inextensible types in Fortran

Per the standard, the array (the derived-type component xyz), which is explicit-shape, will be simply contiguous, a fact the processor will know and can utilize suitably in compiler optimization(s).

And the library design can be both scientific as well as self-documenting to the authors and consumers.

3 Likes

I agree that PDT offer a nice solution here.

Let’s not forget that @hsnyder’s original question also considered a subroutine which could be used easily from C. I have never studied the problem of wrapping a PDT for consumption in C.

I presume a C wrapper of the PDT would be used as follows:

int const n = 1000;

// allocate Fortran object
void *ptr = f_point_pdt(n);  
// (optionally check ptr is not NULL)

float (*xyz)[] = f_point_pdt_getxyz(&ptr);

// fill xyz values
for (size_t i = 0; i < n; ++i) {
   float *xyz_col = &xyz[i];
   ..
}

f_process_points(&ptr);

// destroy Fortran object
f_point_pdt_finalize(&ptr);

I apologize for any mistakes in the C example.

While the PDT works nice in the Fortran-only setting, it adds a significant development cost when the procedures are supposed to be interoperable. To have the best of both worlds, I would probably prefer to have a “lower” layer of explicit-shape routines, that can be consumed or wrapped in C/C++, and also called inside the Fortran PDT methods.

1 Like

Good point. Is it something to consider for a 202Y proposal, perhaps? :thinking: Unfortunately, I have no clue on how technically this better interoperability can be achieved.

Perhaps it is time to summarize the pros and cons of explicit-vs-assumed shape arrays. To my mind, the only pros of explicit are a) F2003 interoperability with C (does not require C descriptors) and b) naming the dimensions can help the programmer focus.

Warning messages for rank mismatches for explicit-shape are unlikely to materialize, there would be too many false positives for a standard-conforming and widely used feature.

There can be no compile-time warnings for runtime objects anyway and although the rank of the explicit-shape array is visible at compile-time, the extents in the shape are not.

b) can be mimicked by a comment
real :: a(:,:) ! a(Nwidgets, Ndim)
Most compilers, it seems, only use the leading-dimensions of explicit-shape arrays and treat them as a comment otherwise with the exception of nagfor -C=calls -C=array and possibly others(?).

It’s probably best to separate discussion of best practice today from the desirability or feasibility of new language features, depite the fact that one feeds the other.

6 Likes