Best Practices: Should pre-allocated work be passed to subroutines?

I have a question about fortran best practices. The code below has two subroutines: do_stuff and do_stuff_prealloc. The subroutines preform the exact same operation, but do_stuff_prealloc passes in all needed work arrays (wrk_arr(n)) for the calculation. In contrast, do_stuff doesn’t pass in the work arrays, and instead the work array is allocated whenever the subroutine is called.

In my current code I use a total of ~400,000 double precision numbers within work arrays. This is ~3.2 megabyes. I have tried both approaches: preallocation, and no preallocation. There is no noticeable speed difference. I suppose this means that the operations I’m performing take much more time than allocating this needed memory.

I’m confused by this (preallocation is not faster). Whenever I use Julia, I notice very large speed differences when a function is allocating a ton of memory vs. the same function that has no allocations.

But I have other concerns. I worry that at some point (really big work arrays), if I don’t pass pre-allocated work arrays, then there will be stack-overflow problems. Maybe I don’t need to worry about this because the compiler + OS is smart enough to avoid this?

What is the best approach?

module mymod
  implicit none
  integer, parameter :: real_kind = kind(1.d0) 
contains
  subroutine do_stuff(n,arr)
    integer, intent(in) :: n
    real(real_kind), intent(inout) :: arr(n)
    real(real_kind) :: wrk_arr(n)
    wrk_arr = arr
    arr = arr + 1.d0
    arr = arr/wrk_arr
  end subroutine
  
  subroutine do_stuff_prealloc(n,arr,wrk_arr)
    integer, intent(in) :: n
    real(real_kind), intent(inout) :: arr(n)
    real(real_kind), intent(inout) :: wrk_arr(n)
    wrk_arr = arr
    arr = arr + 1.d0
    arr = arr/wrk_arr
  end subroutine
end module

Welcome to the forum. Why not ALLOCATE an array and handle any errors, for example

  subroutine do_stuff_alloc(n,arr)
    integer, intent(in) :: n
    real(real_kind), intent(inout) :: arr(n)
    real(real_kind), allocatable :: wrk_arr(:)
    allocate (wrk_arr,source=arr,iostat=ierr)
    ! check ierr
    arr = arr + 1.d0
    arr = arr/wrk_arr
  end subroutine

Depending on the OS and compiler and compiler options used, one may be able to ALLOCATE larger arrays than possible with automatic arrays.

1 Like

Make it a class and put the work arrays in the class and they get allocated when you initialize the class. This subroutine is a class method. The user shouldn’t have to see the work arrays.

2 Likes

As far as I understand, neither if these functions will, in Fortran, allocate the arrays when the function is called. The arrays will be allocated when the program starts, only once.

To mimic that (bad) behavior in Fortran you would need to explicitly allocate the array inside the function and deallocate it at the end (that is not exactly when Julia will deallocate it, but in Fortran you control that manually).

That is the difference. If that function is called thousands of times, allocating and deallocating every time is very bad. But in Fortran that only happens of you do that explicitly.

I would preallocate it, and check the size inside the function, reallocating it with the adequate (new) size if needed.

I’m confused how the subroutine do_stuff will allocate wrk_arr only once at the beginning of the program? Suppose you use the subroutine like this

! ...
allocate(arr1(n))
call do_stuff(n,arr1)
allocate(arr2(n+1))
call do_stuff(n+1,arr2)
! ...

There is no way the program can know, at the start, the size of wrk_arr within do_stuff, so it must allocate that space on the fly.

1 Like

I see. From your original example I understood that n was a compile-time constant.

My bet is that you will get out-of-bounds errors in that case. I don’t know if Fortran handles that automatically for you, and in any case I would not trust on that (meaning, I would go absolutely for the preallocated version).

Ok, that is clear. But what happens at runtime with a code like this:

  subroutine do_stuff(n,arr)
    integer, intent(in) :: n
    real(real_kind), intent(inout) :: arr(n)
    real(real_kind) :: wrk_arr(n)
    wrk_arr = arr
    arr = arr + 1.d0
    arr = arr/wrk_arr
  end subroutine

If arr is allocated dynamically outside this function? More specifically, what happens with the wrk_arr = arr?

I can’t see how that can be safe, unless Fortran behaved like a a dynamical language and reallocated wrk_arr every time. I am somewhat out of shape with Fortran syntax now, but I have the impresion that if arr is allocatable outside that function a module like that won’t even compile. A MWE example would be nice.

Wrk_arr is allocated to size n, as the code says. Arr can be allocatable outside the function. For an MWE, the code

module m
implicit none
integer, parameter :: real_kind = kind(1.0d0)
contains
  subroutine do_stuff(n,arr)
    integer, intent(in) :: n
    real(real_kind), intent(inout) :: arr(n)
    real(real_kind) :: wrk_arr(n)
    wrk_arr = arr
    arr = arr + 1.d0
    arr = arr/wrk_arr
  end subroutine
end module m

program main
use m, only: real_kind,do_stuff
implicit none
integer :: i,j
real(kind=real_kind), allocatable :: x(:)
do i=1,3
   allocate (x(i))
   forall (j=1:i) x(j) = j**2
   write (*,"(/,i4,*(f4.1))") i,x
   call do_stuff(i,x)
   write (*,"(i4,*(f4.1))") i,x
   deallocate (x)
end do
end program main

gives output

   1 1.0
   1 2.0

   2 1.0 4.0
   2 2.0 1.2

   3 1.0 4.0 9.0
   3 2.0 1.2 1.1

with gfortran and Intel Fortran.

1 Like

Clearly I’m not aware of modern Fortran possibilities. In that case I’m also curious about the performance implications of doing that repeatedly. If the subsequent calls are all for identically shaped arrays, the compiler is able to optimize that out? What if the shape is input at runtime?

Yes, I was just thinking about that. If the array is small certainly the compiler will stack or cache that and no allocations (in the heap) should occur. Since the OP compares with Julia, that is what one would obtain using StaticArrays there. Probably if the array is large Fortran does a better job than Julia there, because it is better at stack allocating large arrays.

do_stuff uses an automatic explicit-shape array wrk_arr(n). Automatic arrays are not ALLOCATABLE.
Pros: easy to use, fast implementation possible (just increase the stack frame size).
Cons: may cause unrecoverable error exit on exhaustion of resources.

do_stuff_prealloc uses dummy explicit-shape array wrk_arr(n). The actual argument can be ALLOCATABLE or not.
Pros:fast implementation possible, can minimize resource use (user can share the same workspace for many different such routines).
Cons:longer call sequence, conceptual “leak” (forces user to manage resources).

do_stuff_alloc uses an ALLOCATABLE local array.
Pros:Easy to use, can resume computation if resources exhausted.
Cons:a trip to ALLOCATE could affect performance, needs additional mechanism for error detection.

kargl’s version workarray-less “arr = (arr+1.d0)/arr”
Pros:Easy to use, resource-light, no resource errors possible, performance possible, parallelism-ready.
Cons: does not generalise to all computations.

unwritten suite version designed for a suite of procedures with some workflow in mind. Manages any workspace arrays so user doesn’t have to. Minimizes trips to ALLOCATE, cleans up at end, provides error handling.
Pros: Easy to use,performance possible, resource-light, error recovery.
Cons: more source to write, more maintenance if requirements change.

2 Likes

That is my do_stuff_alloc above.

Yes, sorry I missed it, I will edit.

Maybe I miss something, but why don’t write something like

module mymod
  implicit none
  integer, parameter :: real_kind = kind(1.d0) 
contains
  subroutine do_stuff_simple(arr)
    real(real_kind), intent(inout), dimension(:) :: arr
    arr = (arr+1.0)/arr
  end subroutine
  
  subroutine do_stuff_complex(arr)
    real(real_kind), intent(inout), dimension(:) :: arr
    real(real_kind), dimension(size(arr)) :: wrk_arr
    
    call random_number(wrk_arr)
    arr = arr + wrk_arr
  end subroutine
end module

extension to multidim arr is a little bit cumbersome because you need to write real(real_kind), dimension(size(arr,1),size(arr,2)) :: wrk_arr but for moderate dimensions it’s doable.

Edit: I usually avoid having an extra argument with the dimension, it is against the principle of single source of truth. Use shape, size, or rank to inquire the properties of an array inside a function.

Yes simple example doesn’t need a work array. But there are scenarios (do_stuff_complex) where a work array is required. I’m wondering about the best practices for dealing with those cases.

do_stuff_complex uses an automatic array. It will cause abrupt termination of the entire program if resources are exhausted. This may matter to the nuclear plant operator.

The difference between undetectable failure to allocate automatic variables and the detectable failure of an ALLOCATE statement was one of the motivations to request block-structured exception handling in Fortran. It has been repeatedly requested, but not yet provided.

IF you suspect that your code will be called repeatedly with the same workspace requirements, but not necessarily so, make the workspace local and allocatable, and give it the SAVE attribute. If it’s not allocated, or the wrong size (or maybe just too small), create it anew. Of course, this isn’t thread safe.

With just a few calls this might be more expensive than automatic variables (see other discussions).

1 Like