Syntax convention for mutating and non-mutating functions

Hej everyone,

I have a syntax convention question. Take the Julia syntax convention as an example.
Consider an arbitrary function. The convention in Julia is y = foo(x) does not do pre-allocate y, while foo!(y, x) pre-allocates the y array and modifies it in-place.

What I’m currently using in Fortran is:

y = foo(x) ! No preallocation version.

and

call foo_ip(y, x) ! Array y is pre-allocated.

The _ip stands for in-place. I was wondering though if there is such a syntax convention in Fortran. I kind of recall this was discussed at some point but I’ve been unable to find the corresponding thread.

1 Like

If Y is allocatable but you have already allocated it to the desired size and you are populating it via a function using

y(:)=f(x)

instead of

y=f(x)

might be adequate for most circumstances. If you are partially filling Y with an unknown number of replacements at the point of the call or want to do bounds checks and so on a subroutine interface
has many advantages in efficiency and flexibility in general; but function syntax is typically more compact and clear in my opinion but if the only reason you have to create two interfaces is to control whether Y is open to being reallocated and/or have its size changed that might be all you need.

I used a simple function as example, but what I really had in mind was a more complex subroutine where multiple returns arguments may or may not be modified in-place.

Everything below is opinion.

The syntax convention should be that procedures that mutate their arguments be written as subroutines, not functions. A strength of Fortran is that it distinguishes between functions and subroutines, rather than just having functions that may or may not return something, or may or may not mutate their arguments, as in other languages. You don’t need an _ip suffix in the name of a subroutine, because the reader should be assuming that any arguments can be modified, unless they are declared intent(in) within the subroutine. You could name arguments in a way that indicates whether they are inputs, outputs, or both and used named arguments instead of positional ones. I suggest that the order of subroutine arguments be required arguments before optional arguments, and within the required arguments that inputs be placed before outputs. If I have a call statement with many arguments, I sometimes document the outputs in the same line to clarify what is being changed, for example

call circle(radius, circumference, area) ! output: circumference, area

2 Likes

I use this convention and go a bit further by using two spaces between inputs and outputs in both the calls and in the subroutines. This makes the code easier to read.

Just my opinion:

  • if you can only invest limited effort, offer just the pre-allocated version; users can always wrap it the way they like
  • if you want to offer both, you can reuse the same name, but just import it from different modules. users can rename on import if they don’t like your naming convention

Example for the first bullet:

program demo
   implicit none
   integer, parameter :: sp = kind(1.0e0)

   real(sp) :: a(2,2), x(2), b(2)

   a = reshape(&
      [[2,1], &
       [3,1]], &
      shape=[2,2],order=[2,1]) 

   b = [5,6]
   x = solve(a,b)
   
   print '("Solution = ",*(2X,f0.5))', x
   print '("Residual = ",*(2X,f0.5))', b - matmul(a,x)

contains
   !> Linear-system solution (non-mutating)
   !>   wrapper of the LAPACK procedure with automatic memory
   !>   management. the focus is convenience.
   function solve(A,b) result(x)
      real(sp), intent(in), contiguous :: A(:,:), b(:)
      real(sp), allocatable :: tmpA(:,:), x(:)
      integer, allocatable :: ipiv(:)
      integer :: info, n
      external :: sgesv
   !----
      allocate(x,source=b)
      n = size(b)
      allocate(ipiv(n))
      allocate(tmpA,source=A(1:n,1:n))
      print *, "tmpA = ", tmpA
      call sgesv(n,1,tmpA,n,ipiv,x,n,info)
      if (info /= 0) error stop "solve failed"
   end function
end program

Since the discussion is about efficiency, what can be said about the difference in this case of an allocatable result and an automatic array result.

   real(sp) :: x(size(b))

Presumably, one type of allocation is from the heap and the other is from the stack, but in the case of a function result perhaps that is not such a hard and fast rule. How do various compilers handle this common situation?

AFAIK compilers handle it differently, and the Intel Fortran compiler will create the temporary on the stack.

Ideally, an optimizing compiler would elide the allocation entirely and just use the existing storage of the result array on the left-hand side of x = solve(a,b). I doubt that happens in practice but I’m happy to be proven wrong.

But I’m honestly not too concerned with the performance of such convenience functions. The whole AI and data-science world is using NumPy and SciPy which routinely do these kinds of surplus allocations (either for storage compatibility or out of convenience), tons of runtime checks, not to mention it is in an interpreted language: scipy/scipy/linalg/_basic.py at v1.14.1 · scipy/scipy · GitHub. Of course you don’t want to do this in the time-stepping loop of a HPC code (but I guess many people do…).

Another possible convention:

y = solved(x) ! No preallocation version; past participle form

call solve(y, x) ! Array y is pre-allocated; imperative form