# Pure wrappers for impure procedures

This is a fairly general question, but I’ll try to motivate it with a concrete example. I want to write a nice wrapper around LAPACK’s `dgesv` for solving a general linear systems of equations (Ax=b). This subroutine is impure because the matrix A gets overwritten with the LU factorization and the vector b gets overwritten with the solution. I typically do not want this behavior, so in my wrapper, I copy A and b into temporary arrays. In code:

``````module lapack
implicit none
private
public :: solve

integer, parameter :: dp = kind(1d0)

interface
subroutine dgesv(n, nrhs, a, lda, ipiv, b, ldb, info)
integer, intent(in) :: n, nrhs, lda, ldb
double precision, intent(in out) :: a(lda,*), b(ldb,*)
integer, intent(out) :: ipiv(*), info
end subroutine dgesv
end interface

contains

subroutine solve(a, b, x)
real(dp), intent(in) :: a(:,:), b(:)
real(dp), intent(out) :: x(:)

integer :: n, info
real(dp), allocatable :: atmp(:,:), btmp(:,:)
integer, allocatable :: ipiv(:)

n = size(x)
if (any(shape(a) /= n)) error stop 'shape mismatch between A and x'
if (size(b) /= n) error stop 'shape mismatch between b and x'

atmp = a
btmp = reshape(b, [n, 1])
allocate (ipiv(n))
call dgesv(n, 1, atmp, n, ipiv, btmp, n, info)
if (info /= 0) error stop 'dgesv error'
x = btmp(:,1)
end subroutine solve

end module lapack
``````

Now, from the caller’s perspective `solve` is a pure subroutine. However, it cannot be declared as such because of the internal call to `dgesv`. As a result, any procedures calling `solve` can’t be pure either.

There was a bit of related discussion in the thread Attribute for “pure” procedures that do I/O, with @rwmsu noting that an “impure block” construct within a pure procedure would be a useful language enhancement for this case. Lacking that, does the current language give us a viable workaround?

A pure function must have intent(in) arguments, but a pure subroutine need not. Thus a pure subroutine can overwrite its arguments. The following code compiles, for example.

``````module m
implicit none
contains
pure subroutine sub(x)
real, intent(in out) :: x
x = 2*x
end subroutine sub
end module m
``````
1 Like

You can circumvent the purity requirements by manually specifying an interface to the impure procedure. I got an example on how to do this here.

Whether that’s a good idea is an interesting discussion for sure. As long as the public API has no side effects I’d say it’s OK, but it would be interesting to hear opinions from others as well.

1 Like

Does `DGESV` (and its callees) do `IO`? And is that why you are asking about purity? Because, IIRC, `LAPACK` is mostly a collection of F77+few extensions-based subroutines that are technically pure otherwise even though they won’t be marked as such given their vintage. By “technically pure”, I mean in 2 ways: a) `LAPACK` does not use global or static data, let alone write to them and b) their purity would be consistent with the Fortran standard view where `PURE SUBROUTINE`s are allowed `INTENT(*OUT)` attributes.

So then have you considered working with Fortran stdlib volunteers to author an stdlib_lapack which takes the `BSD` license based `LAPACK` on netlib (or wherever it’s now) and transforms it suitably into a modern MIT-license based stdlib component where each and every Fortran subroutine in stdlib_lapack is indeed PURE? One can start “small” with first `PURE` attribute as widely as possible and later do other modernization aspects.

Cheers, @Beliavsky, I had not realized that `intent(in out)` is allowed in a pure subroutine. I am biased by functional languages’ definition of “purity”, which I think would not ordinarily include `intent(in out)` dummy arguments of subroutines (although, the concept of a “subroutine” is already outside what true functional languages deal with, I think).

In private messages, I was pointed to section 15.7 of the 2018 standard, which has two helpful notes at the end (emphasis mine)

NOTE 15.49

The above constraints are designed to guarantee that a pure procedure is free from side effects (modifications of data visible outside the procedure), which means that it is safe to reference it in constructs such as DO CONCURRENT and FORALL, where there is no explicit order of evaluation.

The constraints on pure subprograms appear to be complicated, but it is not necessary for a programmer to be intimately familiar with them. From the programmer’s point of view, these constraints can be summarized as follows: a pure subprogram cannot contain any operation that could conceivably result in an assignment or pointer assignment to a common variable, a variable accessed by use or host association, or an INTENT (IN) dummy argument; nor can a pure subprogram contain any operation that could conceivably perform any external file input/output or execute an image control statement (including a STOP statement). Note the use of the word conceivably; it is not sufficient for a pure subprogram merely to be side-effect free in practice. For example, a function that contains an assignment to a global variable but in a block that is not executed in any invocation of the function is nevertheless not a pure function. The exclusion of functions of this nature is required if strict compile-time checking is to be used.

It is expected that most library procedures will conform to the constraints required of pure procedures, and so can be declared pure and referenced in DO CONCURRENT constructs, FORALL statements and constructs, and within user-defined pure procedures.

NOTE 15.50

Pure subroutines are included to allow subroutine calls from pure procedures in a safe way, and to allow forall-assignment-stmts to be defined assignments. The constraints for pure subroutines are based on the same principles as for pure functions, except that side effects to INTENT (OUT), INTENT (INOUT), and pointer dummy arguments are permitted.

No, because I generally just wrap the few procedures I happen to need in the code I am working on. Writing a comprehensive set of wrappers for the LAPACK driver routines is a big, boring task that I don’t have the time or interest to undertake.

@nshaffer , it will likely take you significantly less time to simply take the `LAPACK` routines from netlib (or the current online source) and simply place them in `MODULE`s that then gives you ready interfaces, add the `PURE` attributes, `fpm` the package, and run some tests than you would with trying to wrap them and wonder with posts like these where the wrappers with `PURE` are inconsistent with the implementations.

2 Likes

Although, I agree with the statement, personal experience of having implemented explicit interfaces for BLAS/LAPACK, actually resulted in a measurable +5-7% increase in runtime (ceteris paribus). I have no good explanation on the matter other than, maybe more temporary array copies are happening behind the scenes with an explicit BLAS/LAPACK interface.

P.S. I would like to share a MWE for others to have a look but unfortunately it’s our matrix-free solver for radiation transport, which is export controlled so it’s unlikely I will be able to provide anything other than the BLAS/LAPACK interface (which is nothing special)

1 Like

That’s really strange, I find this really hard to believe and it’s contrary to our experience on Windows OS with `IFORT`.

On Windows using `IFORT` and its O2/O3 optimization, the teams I’ve worked with have noticed considerable improvement on the order of 20% or more with refactoring of `FORTRAN 77` style code to just `MODULE`s hosting all the subroutines. With `LAPACK` style, I mean codebases that are `FORTRAN 77` style one-.f file-per-external subroutine approach with no use of global or static data, there can be further performance gains. Then moving such code to `MODULE`s with hosted subroutines and perhaps due to the interprocedural optimization facility (Linux: -ipo) with `IFORT, performance sees a good improvement.

And I just tried a simple 1000x1000 `Ax=B` DGESV case and “ceteris paribus” on Windows: using `IFORT` with /O3 and /Qipo optimization, it’s showing >40% performance improvement.

Click to see the driver code
``````module runs_m

use iso_fortran_env, only : I8 => int64
use ieee_arithmetic, only : ieee_selected_real_kind
use lapack_m, only : DGESV  !<-- This module hosts all LAPACK routines having `PURE` attribute

integer, parameter :: WP = ieee_selected_real_kind( p=15, radix=2 )

integer, parameter :: N = 1000
integer, parameter :: NRHS = 1000

real(WP), allocatable :: A_save(:,:)
real(WP), allocatable :: X(:,:)
real(WP), allocatable :: A(:,:)
real(WP), allocatable :: BX(:,:)
integer, allocatable :: ipvt(:)

contains

subroutine setup( astat )

integer, intent(out) :: astat

allocate( A_save(N,N), stat=astat )
if ( astat /= 0 ) return
allocate( A(N,N), stat=astat )
if ( astat /= 0 ) return
allocate( X(N,NRHS), stat=astat )
if ( astat /= 0 ) return
allocate( BX(N,NRHS), stat=astat )
if ( astat /= 0 ) return
allocate( ipvt(N), stat=astat )
if ( astat /= 0 ) return

call random_number( A_save )
call random_number( X )

end subroutine

impure elemental subroutine run( t )

! Argument list
real(WP), intent(inout) :: t

! Local variables
real(WP) :: t1, t2
integer :: info
integer :: lda, ldb

A = A_save
BX = matmul( A, X )
lda = N ; ldb = N

call my_cpu_time( t1 )
call DGESV( n , nrhs,  a , lda , ipvt, bx, ldb , info )
call my_cpu_time( t2 )

t = t2 - t1
! assert the result
if ( (abs(BX(1,1)-X(1,1))>0.01_wp).or.(abs(BX(N,NRHS)-X(N,NRHS))>0.01_wp) ) then
t = huge(t)
end if
return

end subroutine

impure elemental subroutine my_cpu_time( time )

!.. Argument list
real(WP), intent(inout) :: time

!.. Local variables
integer(I8) :: tick
integer(I8) :: rate

call system_clock (tick, rate)

time = real(tick, kind=kind(time) ) / real(rate, kind=kind(time) )

return

end subroutine my_cpu_time

end module
! calling executive
use runs_m
integer, parameter :: ntrial = 12
real(WP) :: times(ntrial)
integer :: astat
call setup( astat )
if ( astat /= 0 ) stop "Setup failed."
call run( times )
print *, "CPU time (sec): ", (sum(times) - maxval(times) - minval(times))/(ntrial-2)
end
! LAPACK module: only a snippet is shown below
module lapack_m
contains
..
PURE SUBROUTINE dgesv( N, NRHS, A, LDA, IPIV, B, LDB, INFO )
*
*  -- LAPACK driver routine --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*
*     Argument list
INTEGER, intent(in) :: N
INTEGER, intent(in) :: NRHS
INTEGER, intent(in) :: LDA
DOUBLE PRECISION, intent(inout) :: A( LDA, * )
INTEGER, intent(out) :: IPIV(*)
INTEGER, intent(in) :: LDB
DOUBLE PRECISION, intent(inout) :: B( LDB, * )
INTEGER, intent(out) :: INFO
...
``````

Say with above, the CPU time is set as 1 sec, then the time with the case where `DGESV` is marked `EXTERNAL` and each of LAPACK subroutines are external procedures the time is ~1.41 sec. For reference, on one Windows 10 personal laptop with `Intel(R) Core™ i7-6820HQ @ 2.70GHz’ CPU, the measured CPU time of above shown code was about 0.6 seconds whereas it was 0.851 sec with the case where DGESV and its callees are external procedures.

I would thus encourage those working on stdlib to attempt stdlib_lapack, the biggest hurdle here might be what to with BSD license with `LAPACK` compared to MIT license with stdlib.

2 Likes

It’s also contrary to my experience with gfortran, albeit on a different code. Hosting the subroutines in modules and adding intent, led to a 15-20 % improvement in one case.

Thanks @FortranFan and @ivanpribec for the MWE and comments. Out of curiosity how do you implement the LAPACK interface?

My approach is just declaring an `interface` in the module. Adding `INTENT` slightly improved performance but nothing like the stated 5-40% improvements.

``````module lapack_interfaces

!< @brief
!! Module containing explicit interfaces to LAPACK methods for the compiler
!! to run diagnostics

implicit none
interface
pure subroutine dgesv(n, nrhs, a, lda, ipiv, b, ldb, info)
!< @brief
!! DGESV computes the solution to a real system of linear equations
!! A * X = B,
!! where A is an N-by-N matrix and X and B are N-by-NRHS matrices.
!!
!! The LU decomposition with partial pivoting and row interchanges is
!! used to factor A as
!! A = P * L * U,
!! where P is a permutation matrix, L is unit lower triangular, and U is
!! upper triangular.  The factored form of A is then used to solve the
!! system of equations A * X = B.
INTEGER, intent(in) :: N
INTEGER, intent(in) :: NRHS
INTEGER, intent(in) :: LDA
DOUBLE PRECISION, intent(inout) :: A( LDA, * )
INTEGER, intent(out) :: IPIV(*)
INTEGER, intent(in) :: LDB
DOUBLE PRECISION, intent(inout) :: B( LDB, * )
INTEGER, intent(out) :: INFO
end subroutine dgesv
end interface
end module lapack_interfaces
``````

The other weird thing that we sometimes do is that we pass arrays, similar to C; start pointer + how much you need to read. So say we only want to operate in a section of `A`, we do `A(start1, start2)`. I am not sure if doing `A(start1:end1, start2:end2)` leads to array copies (that previously were not present), ultimately leading to performance loss.

At this point the performance difference is just academic curiosity from my part.

PS Compilation is made with the appropriate flags e.g. -O3 -march=native or -O3 -ipo

In my case, and from what I understand also @FortranFan’s, the benefits are noticeable if you put the full procedures in a module and not just the interface. This way the compiler might perform more aggressive inlining and dead code elimination than if the subroutines were in separate compilation units. I am guessing that @FortranFan’s results pertain primarily to the reference (or “Netlib”) LAPACK.

However, in case of LAPACK we should be cautious about making generalizations, since optimized versions like OpenBLAS, ReLAPACK, and Intel MKL LAPACK also make certain algorithmic changes compared to the reference Fortran LAPACK. If you take a look at the OpenBLAS repository, it has `lapack-netlib/` and `lapack/` subfolders. In the latter, a subset of the most commonly used LAPACK routines is actually re-implemented (in C) for better performance.

You can use the Intel Fortran Classic compiler flag `-check arg_temp_created` or the GNU Fortran `-fcheck=array-temps` to verify there are no temporary copies being made. I remember having similar issues once, but have forgotten the details.

Thanks for the suggestion. I recall back when I looked into this, 1.5y ago, I used `-fcheck=array-temps` to check for temps but nothing popped up.