Is creating nested subroutines/functions considered good practice in Fortran?

ivanpribec · September 26, 2023, 11:42pm

Over time I’ve come to believe that callbacks, procedure arguments and procedure pointers are under-utilized by Fortran practitioners. The majority of Fortran codes remain written in a procedural style.

If we think about which codes have lasted long – ODEPACK, MINPACK, COBYLA, SLSQP, ARPACK … – and are reused by hundreds of programmers including those from other programming languages, what many of these program have in common is they accept a procedure parameter or provide a reverse communication interface. Both of these are a form of dependency injection. The reasons these programs are long-lasting is because they help achieve separation of concerns. A numerical expert writes an algorithm; the users just need to provide their problem and receive immediate value from such programs.

Rob Pike, a famous C programmer, has captured this essence of function pointers (procedure arguments in Fortran) in his Notes on Programming in C (pg. 5),

Another result of the tyranny of Pascal is that beginners don’t use function pointers. (You can’t have
function-valued variables in Pascal.) Using function pointers to encode complexity has some interesting
properties.

Some of the complexity is passed to the routine pointed to. The routine must obey some standard
protocol — it’s one of a set of routines invoked identically — but beyond that, what it does is its business
alone. The complexity is distributed.

There is this idea of a protocol, in that all functions used similarly must behave similarly. This
makes for easy documentation, testing, growth and even making the program run distributed over a network — the protocol can be encoded as remote procedure calls.

I argue that clear use of function pointers is the heart of object-oriented programming. Given a set of
operations you want to perform on data, and a set of data types you want to respond to those operations, the
easiest way to put the program together is with a group of function pointers for each type. This, in a nutshell, defines class and method. The O-O languages give you more of course — prettier syntax, derived
types and so on — but conceptually they provide little extra.

Combining data-driven programs with function pointers leads to an astonishingly expressive way of
working, a way that, in my experience, has often led to pleasant surprises. Even without a special O-O
language, you can get 90% of the benefit for no extra work and be more in control of the result. I cannot
recommend an implementation style more highly. All the programs I have organized this way have survived comfortably after much development — far better than with less disciplined approaches. Maybe
that’s it: the discipline it forces pays off handsomely in the long run.

When I think of scientific computing, here are some of the callbacks which immediately come to my mind:

scalar functions (root solving, integration, fitting)
multivariate functions (systems of nonlinear equations, systems of ODEs, integration)
matrices (typically a Jacobian of some sort, or a sparsity pattern)
operators (a transformation returning a vector, e.g. a matrix-vector multiply)

Through composition, you can arrive at quite complex cases. For example you may need to solve a boundary value problem (BVP). For this you may use the shooting method where you combine an ODE solver and a root-finder. Afterward you’d like to fit the BVP to experimental data using a least squares procedure. In this case you will have a hierarchy of callbacks working in sync.

A quite advanced example of callback usage is the Unified Framework for Finite Element Assembly (also known as UFC), which is an interface between the problem-specific and general purpose components of a finite element program used in the FEniCS project. Although written in C and C++, it allows one to provide a set of callback functions to assemble the bilinear and linear forms originating from the weak form of the PDE. The general parts including assembly and solving the system of equations is handled in the C++ framework.

In other fields of computing callbacks are used for all sorts of things:

comparison functions for sorting (see qsort in C)
drawing and GUI callback functions
audio waveforms (see Open Sound Control - #2 by ivanpribec)
teardown functions (see atexit in C)
games (switching between different behaviors, e.g. spells or weapons)

certik · September 27, 2023, 2:25am

@FortranFan your example works with LFortran also:

$ lfortran x.f90 
In eval: f(x) =  1.00000000e+04

I must say I am pleasantly surprised that it actually worked! We are in alpha, meaning I expect things to break.

@ivanpribec yes, I use nested functions for callbacks often, I think it’s a clean and fast way to do it.

fedebenelli · September 27, 2023, 1:12pm

I’m developing an Equations of State library to calculate thermodynamic properties. One of the key principles is that everything can be resumed as a single function of residual Helmholtz energy

A_r(z, V, T)

and with that function and its derivatives, one can calculate most of the thermodynamic and phase equilibria properties. One of the main complexities of this is that there are many different expressions for this function (a common joke is that each research group has its own expression for Helmholtz energy). This is the only thing that is required to define a model, I’ve seen a lot of codes (mostly the F77 codes I’ve inherited from my supervisor) that use some flag and a selector to use the correct function based on the desired model. This is very error-prone since every time you want to add a model you have to edit the source code and be careful to find all the occurrences of this (also it is pretty dirty having to edit the main source just for this)

After learning about procedure pointers it clicked right away, it’s a pretty simple way to add extensibility without going straight into derived types. I’ve started developing (still a WIP) a new framework to ease the process and I’m very happy with the results so far. No more need to edit old sources and feel dirty, now every new model can be a single module that contains its parameters, its hidden procedures for any inner workings, and a setup procedure that sets the needed procedure pointer.

I love easy extensibility and couldn’t find any other library (in any language) that could have a process so simple as this one, even better with how simple it is to write vectorial math in Fortran. A little example of how this is done:

github.com

fedebenelli/yaeos/blob/main/test/test_PR76_twoimplementations.f90#L25-L108


      
          module dirtypengrobinson76
              !-| Peng Robinson 76 Equation of State.
              use yaeos_constants, only: pr, R
              use yaeos_autodiff
              use yaeos_ar_models, only: set_ar_function
              use yaeos_interfaces, only: dual_property
              use triangular_matrix
              
              implicit none
          
              private
              public :: setup_dirty_pr76
          
              real(pr), allocatable :: kij(:, :), lij(:, :), kij_vect(:)
              real(pr), allocatable :: ac(:), b(:), k(:)
              real(pr), allocatable :: tc(:), pc(:), w(:)
          
              real(pr), parameter :: del1 = 1._pr + sqrt(2._pr)
              real(pr), parameter :: del2 = 1._pr - sqrt(2._pr)
          contains

This file has been truncated. show original

here is where the procedure is held and it’s setter routine:

github.com

fedebenelli/yaeos/blob/main/src/models/residual_helmholtz/armodel.f90#L54-L66


      
             procedure(dual_property), pointer :: ar_fun
          
          contains
             ! ===========================================================================
             ! Constructor
             ! ---------------------------------------------------------------------------
             subroutine set_ar_function(ar)
                !-| Select which Ar subroutine to use.
                !   this subroutine sets up the pointer procedure to the desired
                !   Helmholtz subroutine
                procedure(dual_property) :: ar !| Ar subroutine to use
                ar_fun => ar
             end subroutine

I’ve ended up writing a lot more than I thought but I wanted to express how procedure pointers can be really helpful with so little effort to implement!

jcwright · September 27, 2023, 8:47pm

I like pure functions, modern replacement for statement functions to do this if they fit your needs . They should be in lined and have minimal impact on performance.

szakharin · September 29, 2023, 8:16pm

Hi all,

I just want to chime in on the performance point. Some implementations use trampolines to achieve calling Fortran internal subprograms that use variables of the host subprogram via host association. This extra “indirection” does have performance impact, but it is probably less than the consequences of poor optimizations resulting from the call via a procedure pointer itself (comparing to a direct call that might be inlined, enabling more optimizations like vectorization of the loop containing the call). So I think it is not the best approach to use procedure pointers in performance critical spots.

certik · September 30, 2023, 2:28am

@szakharin Yes. The trampolines are also problematic that they require executable stack (at least in the implementations I’ve seen), which is often not allowed. We implemented the nested functions (closures) by creating a module and sharing the needed variables via pointers. I think this approach is cleaner and should be faster, but we have not done any thorough benchmarking yet.

aledinola · September 30, 2023, 1:46pm

Quite interesting! As far as I understand, writing import, all is redundant, in the sense that the deafult bahavior is that the contained function/subroutine has access to all variables defined in the containing program. Is that correct?

FortranFan · September 30, 2023, 5:17pm

By default, the internal subprogram indeed “imports” all the objects from the containing scope.

What IMPORT, ALL provides is documentation of that situation - the readers of your program (who may be you yourself in a future avatar such as some period of time later where the state of memory is different!) may find that clearer to understand / beneficial .

szakharin · October 7, 2023, 2:56am

This is interesting. Is it working with recursion?

Example:

module other
  integer, parameter :: CONST_N = 10
contains
  subroutine foo(fptr)
    integer :: fptr
    print *, fptr()
  end subroutine foo
end module other

recursive subroutine host(local)
  use other
  integer, value :: local
  call foo(callee)
  return
contains
  function callee()
    integer :: callee
    callee = local
    if (local .le. CONST_N) then
       call host(local + 1)
    endif
  end function callee
end subroutine host

Pap · October 7, 2023, 7:47am

As others already pointed out, the contained procedure is fully aware of everything defined in the main program. This abuses the data hiding principle (often called encapsulation or information hiding.) I would strongly recommend to avoid abusing data hiding unless it’s just a trivial case and you know what you are doing. For that reason I only use contains in trivial cases, where I just want to quickly add an auxiliary procedure (which typically isn’t long and doesn’t need many variables anyway.)

Other than trivial cases, I think it’s best to avoid contained functions, unless your compiler supports import, none and import, only, which is the solution to the data hiding abuse. But since I mainly use gfortran and that feature is not yet supported, the best strategy is to either implement a module or write the auxiliary procedure in the same file, but after the end program statement, like this:

program main
implicit none
...
call foo(...)
...
end program
!-----------------------------------------------------------------------------
subroutine foo(...)
! This subroutine is completely independent and unaware of whatever is defined
! in the calling program.
...
end subroutine foo

Yes this is a bit old fashion, but it works: the auxiliary function does respect the data hiding principle, and it’s still right there, easy to see and maintain. The drawback is when the main program uses several other modules and you want the auxiliary function to use them as well. If you write an “independent” procedure in the same file as above, you would have to rewrite the use statements in that procedure.

Writing a module, declare it private and have the auxiliary procedure(s) there is, without any doubt, the way to go for larger programs. In simple cases, however, when you just want a quick auxiliary procedure, implementing a module just for that seems overkill. Although import, only and import, none will solve the issue when it is widely available in compilers, I would argue that this also has a potential drawback, especially for beginners. One could have loads of auxiliary procedures then, contained in the main program, and still independent (or partially independent, as appropriate.) This shouts “create a module instead”, but it will be possible without a module, which I consider bad programming style.

certik · October 7, 2023, 10:52am

Not in our current implementation, but in general yes: you have to maintain a stack in the module, and each recursion increases the stack position. I think the NAG compiler does exactly that. We’ll implement that later.

PierU · October 7, 2023, 12:24pm

And to answer the initial question, it is neither a good nor a bad practice in itself, all depends on the context.

Topic		Replies	Views
Question about CONTAINS Language enhancement	30	1990	February 26, 2023
"Anonymous" functions Help	4	192	November 29, 2024
Subroutines contained in a function Help	3	399	May 9, 2022
Avoiding redirection in nested type-bound procedures Help	6	407	August 18, 2021
Type-bound procedures inside a function/subroutine Help	11	1168	January 7, 2022

Is creating nested subroutines/functions considered good practice in Fortran?

Related topics