I have a question about fortran best practices. The code below has two subroutines: do_stuff and do_stuff_prealloc. The subroutines preform the exact same operation, but do_stuff_prealloc passes in all needed work arrays (wrk_arr(n)) for the calculation. In contrast, do_stuff doesn’t pass in the work arrays, and instead the work array is allocated whenever the subroutine is called.
In my current code I use a total of ~400,000 double precision numbers within work arrays. This is ~3.2 megabyes. I have tried both approaches: preallocation, and no preallocation. There is no noticeable speed difference. I suppose this means that the operations I’m performing take much more time than allocating this needed memory.
I’m confused by this (preallocation is not faster). Whenever I use Julia, I notice very large speed differences when a function is allocating a ton of memory vs. the same function that has no allocations.
But I have other concerns. I worry that at some point (really big work arrays), if I don’t pass pre-allocated work arrays, then there will be stack-overflow problems. Maybe I don’t need to worry about this because the compiler + OS is smart enough to avoid this?
Make it a class and put the work arrays in the class and they get allocated when you initialize the class. This subroutine is a class method. The user shouldn’t have to see the work arrays.
As far as I understand, neither if these functions will, in Fortran, allocate the arrays when the function is called. The arrays will be allocated when the program starts, only once.
To mimic that (bad) behavior in Fortran you would need to explicitly allocate the array inside the function and deallocate it at the end (that is not exactly when Julia will deallocate it, but in Fortran you control that manually).
That is the difference. If that function is called thousands of times, allocating and deallocating every time is very bad. But in Fortran that only happens of you do that explicitly.
I would preallocate it, and check the size inside the function, reallocating it with the adequate (new) size if needed.
I see. From your original example I understood that n was a compile-time constant.
My bet is that you will get out-of-bounds errors in that case. I don’t know if Fortran handles that automatically for you, and in any case I would not trust on that (meaning, I would go absolutely for the preallocated version).
If arr is allocated dynamically outside this function? More specifically, what happens with the wrk_arr = arr?
I can’t see how that can be safe, unless Fortran behaved like a a dynamical language and reallocated wrk_arr every time. I am somewhat out of shape with Fortran syntax now, but I have the impresion that if arr is allocatable outside that function a module like that won’t even compile. A MWE example would be nice.
Clearly I’m not aware of modern Fortran possibilities. In that case I’m also curious about the performance implications of doing that repeatedly. If the subsequent calls are all for identically shaped arrays, the compiler is able to optimize that out? What if the shape is input at runtime?
Yes, I was just thinking about that. If the array is small certainly the compiler will stack or cache that and no allocations (in the heap) should occur. Since the OP compares with Julia, that is what one would obtain using StaticArrays there. Probably if the array is large Fortran does a better job than Julia there, because it is better at stack allocating large arrays.
do_stuff uses an automatic explicit-shape array wrk_arr(n). Automatic arrays are not ALLOCATABLE.
Pros: easy to use, fast implementation possible (just increase the stack frame size).
Cons: may cause unrecoverable error exit on exhaustion of resources.
do_stuff_prealloc uses dummy explicit-shape array wrk_arr(n). The actual argument can be ALLOCATABLE or not.
Pros:fast implementation possible, can minimize resource use (user can share the same workspace for many different such routines).
Cons:longer call sequence, conceptual “leak” (forces user to manage resources).
do_stuff_alloc uses an ALLOCATABLE local array.
Pros:Easy to use, can resume computation if resources exhausted.
Cons:a trip to ALLOCATE could affect performance, needs additional mechanism for error detection.
kargl’s version workarray-less “arr = (arr+1.d0)/arr”
Pros:Easy to use, resource-light, no resource errors possible, performance possible, parallelism-ready.
Cons: does not generalise to all computations.
unwritten suite version designed for a suite of procedures with some workflow in mind. Manages any workspace arrays so user doesn’t have to. Minimizes trips to ALLOCATE, cleans up at end, provides error handling.
Pros: Easy to use,performance possible, resource-light, error recovery.
Cons: more source to write, more maintenance if requirements change.
extension to multidim arr is a little bit cumbersome because you need to write real(real_kind), dimension(size(arr,1),size(arr,2)) :: wrk_arr but for moderate dimensions it’s doable.
Edit: I usually avoid having an extra argument with the dimension, it is against the principle of single source of truth. Use shape, size, or rank to inquire the properties of an array inside a function.
Yes simple example doesn’t need a work array. But there are scenarios (do_stuff_complex) where a work array is required. I’m wondering about the best practices for dealing with those cases.
do_stuff_complex uses an automatic array. It will cause abrupt termination of the entire program if resources are exhausted. This may matter to the nuclear plant operator.
The difference between undetectable failure to allocate automatic variables and the detectable failure of an ALLOCATE statement was one of the motivations to request block-structured exception handling in Fortran. It has been repeatedly requested, but not yet provided.
IF you suspect that your code will be called repeatedly with the same workspace requirements, but not necessarily so, make the workspace local and allocatable, and give it the SAVE attribute. If it’s not allocated, or the wrong size (or maybe just too small), create it anew. Of course, this isn’t thread safe.
With just a few calls this might be more expensive than automatic variables (see other discussions).