Traits, Generics, and modern-day OO for Fortran

Hi all, I would have responded a little sooner, but I wanted to make sure I gave this the attention and consideration it deserves, so I read it carefully and had some discussions with others. I appreciate that a lot of effort went into it, and I am generally in favor of searching for better solutions. That said, I believe the proposed design has a few fundamental problems. Below are my notes, semi-ordered by level of importance/problem.

  1. Polymorphic objects with templated type-bound procedures effectively require run-time compilation. Even from the example in the paper (Listing 6.3), there are places where it is not known until runtime which procedure will be called, and since that procedure is templated, you don’t know until runtime that a particular version of that procedure needs to be compiled. That means at minimum you’d need to keep some form of abstract semantic representation embedded in the executable at the call site and have a fully featured Fortran interpreter available at run time to be able to execute the templated code. The overhead would be huge, so performance terrible, and thus not very “Fortranic”.
  2. I don’t think it’s a good idea to mix OO type extension with traits, namely because how would an extension of an abstract type know all of the traits that it must implement, given that a declaration that a type implements a specific trait may be declared separately (even in a different module)? The following example attempts to demonstrate the problem with this, i.e. where the child type doesn’t know that it needs to implement some trait, and the program expects that it did, specifically wrt the statement from the end of Section 4.2.3.

We finally wish to remark that in case the “implementing” type is an abstract derived type, it must be allowed to provide an only partial implementation of the interfaces that it adopts. However, any non-abstract type that extends this abstract type through subclassing (i.e. implementation inheritance) must then provide a full implementation.

Example
module my_parent_m
    type, abstract :: my_parent_t
    end type
end module

module my_child_m
    use my_parent_m
    type, extends(my_parent_t) :: my_child_t
    end type
end module

module interfaces_m
    abstract interface somethingI
      subroutine something(self)
        deferred(self) :: self
      end subroutine
    end interface
end module

module somewhere
    use interfaces_m
    use my_parent_m

    implements somethingI :: my_parent_t
    end implements
end module

program example
    use somewhere
    use my_child_m

    class(my_parent_t), allocatable :: my_x

    allocate(my_child_t :: my_x)

    call use_something(my_x)
contains
    subroutine use_something{somethingI :: T}(x)
        class(T) :: x
        call something(x)
    end subroutine
end program
  1. Data structures of runtime polymorphic trait objects requires a level of reflection/introspection likely to incur a large amount of overhead. The issue is that you don’t even know which v-table you have to use to look up the function until runtime. I.e. you have to determine which type the object is at run time, then determine which of the functions in that v-table corresponds to the related function from the trait. I think it is still an open question how often you need runtime polymorphism without inheritance, and whether it’s feasible to implement in Fortran.
  2. It is not clear to me if/how the proposed traits allow the expression of relationships/operations between/involving multiple types? Consider the fully generic AXPY template (below) from recent examples/tutorials for templates. How would this be expressed/implemented with the traits proposal?
Generic AXPY
requirement bin_op(T, U, V, op)
  type, deferred :: T, U, V
  deferred interface
    simple elemental function op(x, y)
      type(T), intent(in) :: x
      type(U), intent(in) :: y
      type(V) :: op
    end function
  end interface
end requirement

simple function axpy &
    ( a_type, x_type, y_type, &
      times_result_type, plus_result_type, &
      plus, times) &
    (a, x, y)
  requires bin_op( &
      a_type, x_type, times_result_type, times)
  requires bin_op( &
      times_result_type, y_type, &
      plus_result_type, plus)
  type(a_type), intent(in) :: a
  type(x_type), contiguous, intent(in) :: x(:)
  type(y_type), intent(in) :: y(size(x))
  type(plus_result_type) :: axpy(size(x))

  axpy = plus(times(a, x), y)
end function

integer, parameter :: sp = kind(1.0)
integer, parameter :: dp = kind(1.d0)
real(sp) :: a
integer :: x(10)
real(dp) :: y(10)
instantiate axpy( &
        a_type = real(sp), &
        x_type = integer, &
        y_type = real(dp), &
        times_result_type = real(sp), &
        plus_result_type = real(dp), &
        plus = operator(+), &
        times = operator(*)), mixed_axpy => axpy

print *, mixed_axpy(a, x, y)
  1. Is it possible to address the situation where in certain contexts you want to use different implementations of certain traits for a given type? Consider the following example.
Differently Reduceable Integers
abstract interface :: reduceable
  function combine(a, b) result(combined)
    type(reduceable), intent(in) :: a, b
    type(reduceable) :: combined
  end function
  function empty()
    type(reduceable) :: empty
  end function
end interface

function reduce{reduceable :: T}(a) result(reduced)
  type(T) :: a(:)
  type(T) :: reduced

  integer :: i

  reduced = empty()
  do i = 1, size(a)
    reduced = combine(reduced, a(i))
  end do
end function

implements reduceable :: integer
  procedure :: combine => operator(+), empty => int_zero
end implements
implements reduceable :: integer
  procedure :: combine => operator(*), empty => int_one
end implements

integer :: sum_or_product
sum_or_product = reduce([1, 2, 3, 4]) !? 10 or 24?
  1. The type-sets feature prevents the use of third party types. Also, this feature is effectively already handled by the Japanese generic procedures proposal.
  2. For templated types, I find the scoping rules and correspondence between what is a parameter of the procedure and what is a parameter of the type hard to follow. Further, it’s not quite clear to me how type extension would work for templated types.
  3. It’s unclear to me how exactly the correspondence between procedure arguments and template arguments is done for implicit instantiation.

I do appreciate some of the ideas this proposal put forward, and how it attempts to address a few of the shortcomings of the current templates design. Key among them are the following:

  1. Listing out all the functions and types a template needs can get pretty verbose pretty quickly. The traits concept helps alleviate that problem. I’ve written up, but not yet submitted to J3, an extension to the requirement/requires idea to effectively provide this feature. You can read the initial draft at write a paper solving long template arg list problem by everythingfunctional · Pull Request #140 · j3-fortran/generics · GitHub
  2. Mixing templates and the existing run-time polymorphism features is a bit clunky currently. I have an idea for how to make that more convenient (to the extent the features actually can/should be mixed), as illustrated in the following, not thoroughly considered, example.
Dynamic Dispatch
! consider the following intrinsic, "magic" template that,
! given a type and the name of one of its type-bound procedures
! produces a standalone procedure with the same attributes and
! argument list, but still does dynamic dispatch on the type of
! the passed object dummy argument
subroutine dynamic_dispatch(T, tbp_name)(self, ...)
  type, deferred, abstract :: T
  class(T) :: self
  call self%tbp_name(...)
end subroutine

! thus enabling something like the following example
template tmpl(T, sub)
  type, deferred, extensible :: T
  deferred interface
    subroutine sub(self)
      class(T), intent(in) :: self
    end subroutine
  end interface
contains
  subroutine call_it(x)
    class(T), intent(in) :: x
    call sub(x)
  end subroutine
end template

type :: t1
contains
  procedure :: sub => t1_sub
end type

type, extends(t1) :: t2
contains
  procedure :: sub => t2_sub
end type

subroutine t1_sub(self)
  class(t1), intent(in) :: self

  print *, "Hello from t1_sub"
end subroutine

subroutine t2_sub(self)
  class(t1), intent(in) :: self

  print *, "Hello from t2_sub"
end subroutine

instantiate tmpl(t1, dynamic_dispatch^(t1, sub)), only: call_it

call call_it(t1()) ! prints "Hello from t1_sub"
call call_it(t2()) ! prints "Hello from t2_sub"
  1. People like the OO style/syntax, and the current templates don’t enable it for deferred types. I think there’s a path to enabling it by specifying that a deferred type should treat certain deferred arguments as type-bound procedures. I’ve only just started thinking about the idea, let alone syntax, but I think something like the following could be doable.
TBP on Deferred Type
template tmpl(T, sub)
  ! The following implies T is deferred and extensible.
  ! abstract could be declared explicitly
  prototype :: T
    procedure :: sub
  end type
  deferred interface
    subroutine sub(self)
      class(T), intent(in) :: self
    end subroutine
  end interface
contains
  subroutine call_it(x)
    class(T), intent(in) :: x
    call x%sub()
  end subroutine
end template

Those three ideas taken together allow you to write code that looks (if you squint) a little bit like OO traits. Item 3 gives you back OO syntax inside the template, item 2 lets you pass TBPs as instantiation arguments, and item 1 means you only need to spell out the long list once for the requirement, and once for each type that “implements” it. I can try and work out the sample problem from the traits proposal at some point using these ideas if enough folks want to see what it would look like, but using the above example would look something like the following.

Traits With Templates
requirement my_trait(T, sub)
  prototype :: T
    procedure :: sub
  end type
  deferred interface
    subroutine sub(self)
      class(T), intent(in) :: self
    end subroutine
  end interface
end requirement

template tmpl(impl)
  satisfaction(my_trait) :: impl
contains
  subroutine call_it(x)
    class(impl%T), intent(in) :: x
    call x%sub()
  end subroutine
end template

type :: t1
contains
  procedure :: sub => t1_sub
end type

type, extends(t1) :: t2
contains
  procedure :: sub => t2_sub
end type

subroutine t1_sub(self)
  class(t1), intent(in) :: self

  print *, "Hello from t1_sub"
end subroutine

subroutine t2_sub(self)
  class(t1), intent(in) :: self

  print *, "Hello from t2_sub"
end subroutine

satisfaction :: impl = requires my_trait(t1,  dynamic_dispatch^(t1, sub))
instantiate tmpl(impl), only: call_it

call call_it(t1()) ! prints "Hello from t1_sub"
call call_it(t2()) ! prints "Hello from t2_sub"

I wouldn’t count on those features making it into F202Y, but knowing that they are possible, and that we already have a start on them, my opinion is that the generics subgroup should continue along its current path so that templates enable all the functionality and can make it into F202Y, and then F203X can make it more “user friendly” for OO enthusiasts. My impression is that if we tried to switch designs now, F202Y wouldn’t end up with any generics features. (Well, maybe the Japanese proposal would still make it.)

2 Likes

@kkifonidis , @certik et al. and all the practitioners of Fortran,

Being able to author and consume a generic procedure like SUM intrinsic in as close to a manner as the intrinsic shall be a starting, minimum requirement for any Generics enhancement to the Fortran language.

This should require minimal “gymnastics” on part of the program author. Fortran language has achieved this in an entirely another context with coarrays i.e., PGAS-based SPMD parallel computing. Syntactically [ … ] yielded the magic for the array language that is Fortran, the rest is mostly semantics for the processor implementation.

Simplicity like with coarrays is what is deserved by the practitioners of Fortran.

This is the ideal in any work on Generics in Fortran that the language developers should seek inspiration and imagination. The current pursuit with templates by J3 Generics subgroup fails in this respect and which is why a pause or a change in direction is being petitioned with WG5.

Traits with OO too comes across as way too much “gymnastics”, plus it does not solve the basic use case for a practitioner of Fortran i.e., to be able to author a generic procedure similar to SUM, FINDLOC, etc.

Hence the strong recommendation to go back to the drawing board.

Thanks @everythingfunctional for the thorough response, I really appreciate it. I’ll let @kkifonidis to address the points, as he will be able to give the best answer.

@FortranFan you are quoting some wrong example. Here is the code for your simple sum using the OO-based generics:

What don’t you like on it? It’s as simple as it can get!

1 Like

@certik ,

You will notice the example only works with data of rank-1.

At this stage, rank genericity shall be a basic requirement, not a problem to be resolved later.

3 Likes

@FortranFan got it — you want the generics to handle any rank easily. Do you have some example how one could write the sum function that is rank agnostic? The problem becomes with the loop and indexing.

1 Like

The fortran intrinsic function sum() is rank generic, so it is already in f90 recognized that this is a useful feature. This is one of the features that programmers have heretofore not been able to exploit. I hope a good solution is found for programmers to provide this feature.

No doubt this is a useful feature from the usage perspective. My question is for ideas how to write such a generic function, especially with the dim attribute which requires a special loop structure. It’s not trivial. The committee has proposed some ideas and papers for “rank agnostic” indexing and loops, but it doesn’t look simple to me.

@certik,

Please note the other thread suggesting a design based on KART semantics. That’s what I would recommend. Using notional syntax, the solution will be as follows:

module m
   use, intrinsic :: iso_c_binding, only : c_loc, c_f_pointer
   generic, entity :: E0
      type => intrinsic_numeric
      rank => 0  !<-- notional syntax for rank-0
   end generic
   generic, entity :: E1
      type => intrinsic_numeric
      rank => 1  !<-- notional syntax for rank-1
   end generic
   generic, entity :: E
      type => intrinsic_numeric
      rank => *  !<-- notional syntax for any supported rank
   end generic
contains   
   function sum<E,E1,E2>( a ) result(r)
      <E>, intent(in), target :: a
      <E0> :: r
      ! Local variables
      <E1>, pointer :: x
      integer :: i
      call c_f_pointer( cptr=c_loc(a), fptr=x, shape=[ size(a) ] )
      r = NIL
      do i = 1, size( a )
         r = r + x(i)
      end do
   end function 
end module 

I would not consider this type of solution very elegant or adding any expressiveness to the language. This type of thing can already be done with equivalence or any other method to trick the compiler into letting you access your N-dimensional variable as a 1-D variable with the same total elements. In fact, I think a simple procedure with dummy arguments x(*) and n will already work like this. At that point you only require being generic on type, so I would argue that this style solution for “an algorithm that is rank agnostic” isn’t actually adding anything new. There is nothing about this code that actually works with the dimensionality of the input.

I think that the assumed-rank syntax would do. The point is to be able to use that variable without a select rank construct. I understand that the tricky part is to figure out which ranks are actually needed at compile time.

1 Like

Here is a challenge:

How would one write the Softmax function to make it rank agnostic yet enable to chose upon which dimension to apply the maxval and sum operations using any of these proposals? (this function shall act upon tensors of rank >= 1)

For rank 1 and 2, with the current Fortran features one could write something like:

Example Softmax for rank 1 and 2
module test

interface softmax
    module procedure :: Softmax_r1
    module procedure :: Softmax_r2
end interface

contains

pure function Softmax_r1( x ) result( y )
    real, intent(in) :: x(:)
    real :: y(size(x))

    y(:) = exp(x(:) - maxval(x(:)) )
    y(:) = y(:) / sum(y(:))
end function

pure function Softmax_r2( x , dim ) result( y )
    real, intent(in) :: x(:,:)
    real :: y(size(x,dim=1),size(x,dim=2))
    integer :: shp(rank(x))
    real, allocatable :: temp(:)
    integer, intent(in), optional :: dim
    integer :: dim_

    dim_ = 1; if(present(dim)) dim_ = dim

    shp(:) = shape(x)
    shp(dim_) = 1
    allocate(temp(product(shp)) )

    temp = maxval(x , dim = dim_ ) 
    do j = 1, size(x,dim=2)
        do i = 1, size(x,dim=1)
            k = merge(i,j,dim_==2)
            y(i,j) = exp( x(i,j) - temp(k) )
        end do
    end do

    temp = sum(y , dim = dim_ )
    do j = 1, size(x,dim=2)
        do i = 1, size(x,dim=1)
            k = merge(i,j,dim_==2)
            y(i,j) = y(i,j) / temp(k)
        end do
    end do
end function

end module

program main
use test

real :: x(3,2), y(3,2)

x(:,1) = [-1.0547, -0.5360,  0.5134]
x(:,2) = [ 0.2985, -0.7548,  1.3781]

print *, 'Softmax on dim = 1'
y = Softmax(x,dim=1)
print *, y(:,1) ! 0.133736923      0.224656940      0.641606092  
print *, y(:,2) ! 0.232976034       8.12585130E-02  0.685765445  

print *, ''
print *, 'Softmax on dim = 2'
y = Softmax(x,dim=2)
print *, y(:,1) ! 0.232976034       8.12585130E-02  0.685765445  
print *, y(:,2) ! 0.794652283      0.445517182      0.703641713 

end program

EDIT

Example Softmax for rank 1 to 4
module test
implicit none
interface softmax
    module procedure :: Softmax_r1
    module procedure :: Softmax_r2
    module procedure :: Softmax_r3
    module procedure :: Softmax_r4
end interface

contains

pure function Softmax_r1( x ) result( y )
    real, intent(in) :: x(:)
    real :: y(size(x))

    y = exp(x - maxval(x))
    y = y / sum(y)
end function

pure function Softmax_r2( x , dim ) result( y )
    real, intent(in) :: x(:,:)
    real :: y(size(x,dim=1),size(x,dim=2))

    integer, intent(in), optional :: dim
    integer :: dim_, j

    dim_ = 1; if(present(dim)) dim_ = dim

    if(dim_==1)then
        do j = 1, size(x,dim=2)
            y(:,j) = Softmax_r1( x(:,j) )
        end do
    else
        do j = 1, size(x,dim=1)
            y(j,:) = Softmax_r1( x(j,:) )
        end do
    end if
end function

pure function Softmax_r3( x , dim ) result( y )
    real, intent(in) :: x(:,:,:)
    real :: y(size(x,dim=1),size(x,dim=2),size(x,dim=3))

    integer, intent(in), optional :: dim
    integer :: dim_, j

    dim_ = 1; if(present(dim)) dim_ = dim

    if(dim_<=2)then
        do j = 1, size(x,dim=3)
            y(:,:,j) = Softmax_r2( x(:,:,j) , dim = dim_ )
        end do
    else
        do j = 1, size(x,dim=1)
            y(j,:,:) = Softmax_r2( x(j,:,:) , dim = 2 )
        end do
    end if
end function

pure function Softmax_r4( x , dim ) result( y )
    real, intent(in) :: x(:,:,:,:)
    real :: y(size(x,dim=1),size(x,dim=2),size(x,dim=3),size(x,dim=4))

    integer, intent(in), optional :: dim
    integer :: dim_, j

    dim_ = 1; if(present(dim)) dim_ = dim

    if(dim_<=3)then
        do j = 1, size(x,dim=4)
            y(:,:,:,j) = Softmax_r3( x(:,:,:,j) , dim = dim_ )
        end do
    else
        do j = 1, size(x,dim=1)
            y(j,:,:,:) = Softmax_r3( x(j,:,:,:) , dim = 3 )
        end do
    end if
end function

end module

program main
use test

real :: x(3,3,3), y(3,3,3)

x(:,1,1) = [0.82192878, 0.76998032, 0.98611263]
x(:,2,1) = [0.8621334 , 0.65358045, 0.26387113]
x(:,3,1) = [0.12743663, 0.35237132, 0.23801647]

x(:,1,2) = [0.69773567, 0.40568874, 0.44789482]
x(:,2,2) = [0.42930753, 0.49579193, 0.53139985]
x(:,3,2) = [0.03035799, 0.65293157, 0.47613957]

x(:,1,3) = [0.21088634, 0.9356926 , 0.0991312]
x(:,2,3) = [0.46070181, 0.02943479, 0.17557538]
x(:,3,3) = [0.10541313, 0.33946349, 0.34804323]

print *, 'Softmax dim=1'
y = Softmax(x,dim=1)

print *, y(:,1,1) ! 0.319712639      0.303528070      0.376759291    
print *, y(:,2,1) ! 0.423455358      0.343743294      0.232801422    
print *, y(:,3,1) ! 0.296809316      0.371676773      0.331513911    

print *, y(:,1,2) ! 0.395936400      0.295658976      0.308404684    
print *, y(:,2,2) ! 0.314838648      0.336482018      0.348679334    
print *, y(:,3,2) ! 0.225966826      0.421138495      0.352894694    

print *, y(:,1,3) ! 0.252614945      0.521480858      0.225904226    
print *, y(:,2,3) ! 0.416388273      0.270521373      0.313090324    
print *, y(:,3,3) ! 0.282621205      0.357150704      0.360228121    

end program
2 Likes

Using pointer-rank remapping is a clever idea. Note that you don’t need the C interop features to accomplish it, as the call to c_f_pointer could be replaced with x(size(a)) => a. There are some other points about your example that need some additional consideration.

  1. Your use of intrinsic_numeric I presume means “any kind of integer, real or complex”, but sum is already intrinsic for those types, and then as written your generic version isn’t usable for any other types, so I don’t see the point in writing it this way.
  2. I presume sum<E,E1,E2> was intended to be sum<E,E0,E1>, but the main thing here is I don’t think you need or want E0 or E1, because r and x must be the same type as a, and not generic in rank. I.e. they should be declared as typeof(a) :: r and typeof(a), pointer :: x(:).
  3. You use some special new value NIL, but in the case of intrinsic numeric types, assignment of 0 works just fine. If this were a properly generic sum, what would determine the value of NIL for the actual type?

IMO, the Japanese generic procedures proposal accomplishes this restricted case much better. I.e.

generic function sum(a) result(r)
  type(integer(*), real(*), complex(*)), rank(1:), target :: a
  typeof(a) :: r
  typeof(a), pointer :: x(:)
  integer :: i
  x(size(a)) => a
  r = 0
  do i = 1, size(a)
    r = r + x(i)
  end do
end function

If you want it to be properly generic, you need some way for users to specify what the addition operation and zero value is for their type. I.e.

function sum(T, add, zero, N)(a) result(r)
  type, deferred :: T
  deferred interface
    type(T) function add(lhs, rhs)
      type(T), intent(in) :: lhs, rhs
    end function
    type(T) function zero()
    end function
  end interface
  integer, deferred, constant :: N
  type(T), rank(N), intent(in), target :: a
  type(T) :: r
  type(T), pointer :: x(:)
  integer :: i
  x(size(a)) => a
  r = zero()
  do i = 1, size(a)
    r = add(r, x(i))
  end do
end function

You can’t infer those things, because not everyone will have spelled it operator(+), or have defined assignment from default integer for their type. A little bonus from having written it this way is that then you can do

product = sum^(my_type, times_my_type, my_type_one, 1)(a)

On a related side-note, how about allowing intrinsic functions which require complex arguments to also allow real or integer arguments? This is mathematically legitimate but currently not permitted by the Standard.

Thus:

program test
implicit none
real x

x=42.0

! should return x
print *,conjg(x)

! should return 0
print *,aimag(x)

end program

… should be allowed by the Standard. Then, for example, if a user wrote a generic routine for complex Hermitian matrices, it would automatically work for real symmetric matrices as well.

For user subprograms, this only works for contiguous dummy arguments, which in turn requires copy-in/copy-out argument association when the actual argument is not contiguous. For some operations, such as sum(), where the array elements are only referenced once, that overhead would be prohibitive. I don’t think the intrinsic sum() works that way on most compilers. I think it works in a rank-generic way even when the actual argument is not contiguous without the extra overhead. Allowing features like optional dim arguments is a further complication for noncontiguous actual arguments. I hope this feature can be designed so that programmers can write rank-generic code in a simple way and that compilers can implement efficiently, e.g. without unnecessary copy-in/copy-out argument association.

1 Like

I think it would look like this with the templates proposal.

rank generic softmax
module test
contains
pure function softmax(T, R, dim, maxval, sum, exp, sub, div)(x) result(y)
    type, deferred :: T
    integer, deferred, constant :: R, dim
    deferred interface
        pure function maxval(x, dim) result(m)
            type(T), rank(R), intent(in) :: x
            integer, intent(in) :: dim
            type(T) :: m([(size(x,dim=i), integer :: i = 1, dim-1), (size(x,dim=1), integer :: i = dim+1, R)])
        end function
        pure function sum(x, dim) result(m)
            type(T), rank(R), intent(in) :: x
            integer, intent(in) :: dim
            type(T) :: m([(size(x,dim=i), integer :: i = 1, dim-1), (size(x,dim=1), integer :: i = dim+1, R)])
        end function
        pure function exp(x)
            type(T), intent(in) :: x
            type(T) :: exp
        end function
        pure function sub(lhs, rhs)
            type(T), intent(in) :: lhs, rhs
            type(T) :: sub
        end function
        pure function div(lhs, rhs)
            type(T), intent(in) :: lhs, rhs
            type(T) :: div
        end function
    end interface
    ! The following lines aren't part of the current proposal,
    ! nor would they really be necessary, but would give better
    ! compile time error messages, and have been suggested as an extension
    ! requires (R >= 1)
    ! requires (dim >= 1 .and. dim <= R)

    type(T), rank(R), intent(in) :: x
    type(T) :: y(shape(x))

    type(T) :: temp([(size(x,dim=i), integer :: i = 1, dim-1), (size(x,dim=1), integer :: i = dim+1, R)])
    type(T) :: iter(R), k(R-1)

    temp = maxval(x, dim=dim)
    iter = 1
    do while (all(iter <= shape(x)))
        k = iter(@[(i,integer::i=1,dim-1), (i,integer::i=dim+1,R)])
        y(@iter) = exp(sub(x(@iter), temp(@k)))
        call increment_iter(iter)
    end do
    temp = sum(y, dim=dim)
    iter = 1
    do while (all(iter <= shape(x)))
        k = iter(@[(i,integer::i=1,dim-1), (i,integer::i=dim+1,R)])
        y(@iter) = div(y(@iter), temp(@k))
        call increment_iter
    end do
contains
    pure subroutine increment_iter(iter)
        integer, intent(inout) :: iter(R)
        integer :: i
        do i = 1, R
            iter(i) = iter(i) + 1
            if (iter(i) > size(x, dim=i)) then
                iter(i) = 1
            else
                exit
            end if
        end do
    end subroutine
end function
end module
program main
    use test

    real :: x(3,2), y(3,2)

    x(:,1) = [-1.0547, -0.5360,  0.5134]
    x(:,2) = [ 0.2985, -0.7548,  1.3781]

    print *, 'Softmax on dim = 1'
    y = softmax^(real, rank(x), 1, maxval, sum, exp, operator(-), operator(/))(x)
    print *, y(:,1) ! 0.133736923      0.224656940      0.641606092  
    print *, y(:,2) ! 0.232976034       8.12585130E-02  0.685765445  

    print *, ''
    print *, 'Softmax on dim = 2'
    y = softmax^(real, rank(x), 2, maxval, sum, exp, operator(-), operator(/))(x)
    print *, y(:,1) ! 0.232976034       8.12585130E-02  0.685765445  
    print *, y(:,2) ! 0.794652283      0.445517182      0.703641713 

end program

All these solutions to rank genericity assume that the dummy argument is contiguous. For the pointer-rank remapping or the c_f_pointer solutions, it has to be enforced with the contiguous attribute if the dummy argument is assumed-shape or assumed-rank. In all cases this will result in copy-in for a non-contiguous actual argument, with the associated performance penalty and possible memory issues if the actual argument is a very large array.

Edit: @RonShepard wrote the same while I was writing this…

Forgot about that part. You’re right.

This is true, but there is also a performance penalty for non-contiguous memory access, so it would be unclear which is more costly. Also, how often are non-contiguous large arrays used in practice? My guess is it’s very rare, but I could be wrong. In any case, you can do the rank agnostic stuff without the pointer rank remapping, as I show in the rank generic softmax example, it’s just a lot more complicated.

This mostly depends on the “arithmetic intensity” (AI) of the computations in the routine. A straight sum algorithm has a low AI, and in this case the overhead of the copy is probably significant. If the algorithm is more complex with a higher AI, then the cost of the copy can get negligible. Still, there is the memory issue, which is particularly a problem with compilers that tend to allocate the temporary arrays on the stack (limited size).

I agree. Nonetheless the ability to pass arbitrary slices without hidden copies is a significant feature in Fortran.

Good points. I guess that means authors should consider this aspect when deciding how to implement rank agnostic code.