Is using explicit array slice ranges considered bad for Fortran?

Hello everyone,

Is being descriptive and using 1:3 in the code like this considered good or bad practice?

It helps me to write code like this, because 1:3 lets me know that this is an array with 3 elements, and we’re assigning from the 1st to 3rd index.

I also heard that passing array slices to functions is considered bad, because the compiler has to make a copy of the slice … however in this case, 1:3 indicates the whole array, and is technically not a partial slice of the array, so the compiler should know that we’re passing the whole array, and not just a slice.

So I don’t really know if it is a good or bad practice.

I want to keep using this if this is not bad for performance, because it helps me understand the code easily.

Thanks

x

My opinion is that it is bad practice in general - if nothing else it confuses the reader who may wonder what is happening to the rest of the array.

Copies aren’t passed unless the argument is discontiguous and the compiler thinks that the called routine is expecting a contiguous array. If the dummy argument is assumed-shape, then no copy even then (unless it has the CONTIGUOUS attribute.)

I wrote in Doctor, it hurts when I do this! - Doctor Fortran (stevelionel.com) that using (:) to say “this is an array” can bite you - I’d consider your use here to be similar.

4 Likes

I would also add that it can lead to problems whenever explicit bounds are used, as out-of-bound access is UB (as discussed here).

For example:

integer :: istack(0:2)
integer, allocatable :: iall(:)

allocate(iall(-1:1))
1 Like

I am in the (:) team :slight_smile: … That’s true that for a long time compilers struggled to fully optimize expressions with array sections, even though they were representing the whole array. But I hope this is no longer true nowadays, and that any decent compiler is able to handle a(:) as if it was a. If it was not the case, then compilers would not be able either to recognize that b(:,i) is contiguous section (given that b itself is contiguous), which would be very annoying…

However, I am more reserved when bounds are specified. At compile time, most of time the compiler can not determine if a(1:3) is the whole array or not.

1 Like

I enjoyed reading Dr. Fortran’s post Doctor, it hurts when I do this! - Doctor Fortran. It is amusing and educating.

I was extremely happy to read the following.

One of the big strengths of modern Fortran is its wealth of array-oriented features. Few languages offer the whole-array and array slice operations that Fortran does, and often you can do an array operation without a traditional DO loop. For example, you can add two arrays with:

A = B + C

You can even have functions that return arrays (explicit interface required!), …

I have been told many times that array expressions are not performant, and writing loops is the correct way of handling arrays. I was worried about this at the beginning, but have decided to ignore such advice completely and use array expressions whenever possible. This is what I did when coding PRIMA.

I believe that a language for scientific computing must be able to handle array expressions efficiently, and it should become more and more efficient in this aspect if things are not perfect for the moment. The reason is simple — matrix-vector calculations are the basis of scientific computing.

Let us make the best use of modern Fortran’s “big strength”, and make Fortran even stronger in this aspect.

3 Likes

There have been several other good comments on this question, but here are a few more for this discussion.

If you know that your code will always work on (1:3) arrays, say they are Cartesian space coordinates, then specifying that slice might be a good idea because it tells other programmers (or you yourself) that the variable is an array rather than a scalar within an expression. But if your current problem just happens to have bounds 1:3, and you might need to change this in the future to some other upper bound, then it might be better to either remove the slice notation entirely (and use a whole array expression instead) or change the slice to simply (:). This way, your code will adjust automatically if you ever change the upper bound.

When this occurs on the lhs of an expression, there are also complications related to the automatic reallocation. If you know that the lhs is allocated and is already the correct shape, then

a(:,:) = b
a  = b

should do the same thing, but perhaps with different amounts of effort. I’m using a 2D example here just to show that this is general, not specific to 1D arrays. In that first expression, the compiler might first check that the arrays (or the expression on the rhs in the general case) conform. In this context, it means that they have the same extents in all dimensions (ignoring the array=scalar special case here). It does not mean that the lower and upper bounds must match, just that the extents all match. That array shape check requires a little effort, but not much. Typically the subsequent memory to memory copy will require more effort than the check. If this expression occurs in a tight loop, then a programmer might even specify through optimization compiler options that the check can be removed. In this case, the programmer is assuming all responsibility for the shape matching in order to get the most efficient runtime performance.

But if the allocation status of the lhs is unknown to the programmer, then the second expression might be used. This expression requires that the compiler test for allocation of the array a(:,:), and allocate it appropriately if necessary before the assignment operation occurs. It will allocate the array of the correct shape and with the default lower bounds of 1 in all dimensions. If the programmer wants different bounds, then he must assume the responsibility for the allocation test, the allocation, and for setting the bounds.

The language standard is a little ambiguous about what happens in the second expression when the shapes match, it only specifies clearly what happens when they don’t match. Thus the compiler must check, typically at run time, if the shapes match. When the shapes match, then the programmer might reasonably expect the lhs will retain its original allocation and with its original bounds. However, a compiler is allowed to reallocate even in this case. A consequence of this is that upon execution of the statement the bounds of the lhs array might change, and pointers to the lhs array become dissociated from their original targets. Of course, it might also mean that the assignment statements takes more effort than necessary. Allocation, especially heap allocation that is typically associated with allocatable arrays, is an expensive operation. However, there are also cases where the reallocation might require less effort, e.g. where the assignment is achieved with a move_alloc() kind of operation. I think that is why the language standard has remained ambiguous on this particular point for the last two decades, it wants to allow compilers the freedom to implement this potentially very efficient optimization technique for whole array assignment.

This ambiguity then places more responsibility on the programmer who wants to write the most efficient code. If the programmer wants the move_alloc() kind of operation, then he can program it explicitly. If the programmer wants to avoid the shape tests and the allocation, then he can write the lhs with slice notation, (:,:). If the programmer don’t care about any of that, then he can write the assignment as a whole array expression.

2 Likes

On the other hand, I recently read (maybe/probably here) “If you have to think more than 10 second how to formulate your problem with full array syntax, then you should probably write a loop instead”, and I tend to agree (anyway, loops and array syntax are not exclusive, they are often combined).

1 Like

Thanks to everyone for sharing their valuable insights.

For my case, yes, this array will always have 3 elements. I like to use the explicit bounds of these small arrays because they help me understand the code better.

I mostly like to use them for small arrays.

The 1:2 or 1:3 is also a “promise” that the number of elements in the array will not change, because if it did, the code will become wrong.

In future maybe I may change my views if this method doesn’t conform to the best practices of fortran, but for now, I find it helpful.

code

Programmers who said they preferred not to use slices in such cases, how do you make it clear that a variable is an array? Do you put a hint in the name, or wrap it in a type, or what else do you do?

As I thought about this, I felt that the most elegant way would be to improve editors and IDEs, so that they can highlight arrays as distinct from variables. But that’s not trivial, and would require something like tree-shaking.

2 Likes

Nothing. If the expression is complex enough that there might be confusion, I prefer to loop. Often the name of the variable does provide a hint, but I don’t make a conscious effort to do that.

2 Likes
A = B + C     ! (:,:)
1 Like

I am strongly in the group that array sections can be descriptive, but should best be used to indicate use of vector instructions.
I always avoid array sections that imply non-contiguous memory usage, as this can imply use of temporary arrays.

In my following examples, they are both contiguous memory usage in the (implied) inner loop and the hope is the compiler will use simd instructions.

  subroutine Vec_add ( Y, X, a, n )
!
!   Performs the vector operation  [Y] = [Y] + a * [X]
!
   integer n 
   real*8  Y(n), X(n), a 
! 
   if ( n == 1 ) then
      Y(1) = Y(1) + a * X(1)

   else if ( n > 0 ) then
!     Y = Y + a * X    ! is legal Fortran, but is ambiguous when seen without declarations
      Y(:) = Y(:) + a * X(:)     ! clear indication of simd suitable

   end if

  end subroutine Vec_add

  subroutine do_yes (a,b,n, k,ti)
    integer n,i,j,k
    real*8 a(n,n), b(n,n), ti(3), f

    ti(3) = ti(3) + 1
    f = 2+k
   !$OMP PARALLEL DO PRIVATE(I,J) SHARED(A,B,N,f) SCHEDULE(static) 
    do i = 2, n
!       do j = 1, i
          b(1:i,i) = ( a(1:i,i) + a(1:i,i-1) ) / f
!       end do
    end do
   !$OMP END PARALLEL DO
  end subroutine do_yes

In the OP example: rawNodes(k1)%x(1:3) = rawCells(1:3,1,i)

This asks the question is this different from : rawNodes(k1)%x( : ) = rawCells( : ,1,i)
It demonstrates that the array (and type) declaration needs to be close to the sample code, to avoid confusion.

Then again, the example is “setup raw nodes” which implies a part of code where efficiency is not required, so clarity is far more important.

1 Like