Assumed-size arrays in generic procedure interfaces

I’m building methods that cast complex derived types to arrays of doubles (to easily read/write DTIO-like or send them via MPI, but let’s not open these discussions here).

My methods use an assumed-size array (dimension(*)) and are put inside a generics interface, because while reading data from it, the total size that will be read is not known yet, so I provide a pointer to the first location:

call my_routine(...,array(first_location))

Where the interface subroutine has this form:

! Expand object from a real array
pure subroutine my_routine(x,real_array)
class(packageable), intent(out) :: x
real(real64), intent(in) :: real_array(*)

And is within an interface:

interface expand
   module procedure expand_int32_1d
   module procedure expand_int32_2d
   !etc...
end interface

If I call the exact routine, everything works, while if I call the interface, I get there is no specific subroutine for the generic (using gfortran).

What does the standard say or other compilers do about this case?

1 Like

I guess the problem might be that in your call, array(first_location) is a scalar, so the call cannot be matched to any procedure having an array as the corresponding dummy arg.

You could probably try to pass the full array and the position of first location as additional arg(s).

1 Like

Thanks, that’s a possible solution.

I think the issue is what you’re pointing out (generic interface has an array (*), but I’m calling it as a scalar array(first_location), so the compiler can’t resolve the call to any specific routine in the interface). I can’t find references to the standard about this online, so this forum seems like the right place to clear that out.

At the same time, I’d like to avoid using assumed-shape (:) in the first place, because I think I would incur in significant slowdowns when many such calls are made (derived types with several variables, nested types), but maybe not?

I have noticed this feature in the past, but I have not yet tracked it down in the standard to see exactly how the generic resolution is supposed to work in these cases.

A workaround is to change the call to use an array slice actual argument, a(first:last). Then the generic resolution correctly associates the actual argument as an array and the dummy argument as assumed size (*) array. Of course, in some cases the compiler will be required to use copy-in/copy-out to make the argument match, but that should only occur in cases where the original code would not have worked anyway, so it is backwards compatible.

I’m not sure exactly what you are saying here. The advantage to assumed shape (:) dummy arguments is that it avoids copy-in/copy-out in some cases where explicit shape (n) or assumed size (*) would require it. So in general, it would be more efficient to use assumed shape than less efficient.

However, assumed shape also requires an explicit interface, so if you are working with f77 external routines, it may not be straightforward to simply change the dummy argument declaration. This can be done in two steps. First, add an interface block for the external routine that has the assumed size declaration. Then you can add that interface block everywhere that the external routine is referenced. The code will continue to work as these changes are being made, with some references using the interface block and some not, so tests can be performed during this period. Then when all references use the interface block, you can change the declaration to assumed shape, recompile, and everything should work. From that point on, you have all the benefits of assumed shape, in particular, you can call the routine with noncontiguous array slices without suffering copy-in/copy-out overhead.

1 Like

The flexibility and safety guarantees of assumed shape arrays outweigh any minor performance improvements of explicit shape arrays or assumed size arrays, in my experience. Here is an example runtime benchmark for assumed shape vs. explicit shape. However, what I find annoyingly missing and desperately needed in Fortran programs are the deferred-rank allocatable arrays and function output.

1 Like

It is odd that in the first graph you see only small advantages for assumed shape declarations, while in the second graph you see some explicit shape advantages. Maybe it is a matter of scale?

I suspect those timings were for contiguous arrays. If noncontiguous arrays (e.g. noncontiguous array slices) were included, then I expect the assumed shape argument would be consistently more efficient.

The first is not so informative because of the scale. The second graph is more useful as it compares the runtime ratio of the assumed shape interface to the explicit shape interface. The explicit shape appears slightly (<5%) faster on average, but note that the assumed shape interface also passes an extra argument failed to keep track of runtime failures (whereas the explicit shape does not). The observed performance gain also depends significantly on the computational complexity within the procedure. For me, the beauty, generality, and the safety offerings of assumed shape arguments outweigh the tiny performance penalties, if any. I have encountered a noticeable number of bugs that could have been nearly impossible to detect without assumed-shape arrays.

I agree on this, but you keep mentioning how explicit shape has a computational advantage, and I don’t think it does in practice. I doubt that an extra argument in the calling sequence would ever result in a measurable difference.

Upon looking at the code, I see that the dummy argument is declared explicit shape, but it has the contiguous attribute. That attribute will cause copy-in/copy-out to occur for noncontiguous actual arguments, the same as would an explicit shape or an assumed size dummy argument. Thus it will not show the advantage that assumed shape arguments would provide of avoiding those two copy steps. I presume that there are other reasons for using the contiguous attribute, such as memory locality, that have been determined to be more important for this code.

This is the same general problem that LAPACK now has. It uses mostly assumed size and explicit shape dummy arguments, which triggers the copy-in/copy-out to occur for noncontiguous actual arguments. LAPACK already blocks matrices to take advantage of level-3 BLAS operations, so it is likely that the whole library would benefit from using assumed shape arrays with small local (i.e. within cache) work arrays. I somehow doubt that we will ever see a version of LAPACK designed like that.

In the first graph in your reference, you can see a notable upward curve. On a log-log plot like this, you should see something much closer to a straight line. That upward curve is a much larger effect than any difference between assumed shape and explicit shape array arguments. That upward curve is almost certainly due to cache effects. For larger matrices, the operations span regions outside of the local cache size. That behavior can be avoided by using sub blocking the matrix operations, the way LAPACK does, with local work arrays that stay within the cache size. Designing algorithms that work that way and that are portable and adapt to the cpu characteristics is very difficult. Within standard fortran, the programmer cannot query how much stack memory is available, how many levels of cache are available, or the sizes of the caches, all essential information for the task.

1 Like

As far as I have explored lapack routines, it resolves the copy in/out by passing the whole array along with strides explicitly where needed. What I do not know is whether LAPACK’s approach is any faster than assumed shape arrays or is it merely a historical limitation of its time.

@FedericoPerini ,

As pointed out to you by @shahmoradi , your use of assumed-size array is really buying you little. You can view an assumed-size array as effectively a 1-D target that you can reshape to pointers of different ranks with the onus of managing the shape still remaining on you as a programmer. Thus you as a coder end up forcing the generic resolution via shape management. Instead you can have a single procedure with an assumed-shape array argument with TARGET attribute and achieve the same purpose via local pointers. Note with assumed-size received arguments of rank greater than 1, it is explicit-shape in all dimensions except the last, thus why bother …

Separately, you can do as advised by @RonShepard and use array slices (or vector subscripts) as a clearer way to “consume” your generic procedure if you need to persist with that option.

module m
   private
   generic, public :: sub => sub1, sub2
contains
   subroutine sub1( n, x )
      integer, intent(in) :: n
      real, intent(in)    :: x(*)   !<-- assumed-size only in the last dimension
      print *, x(1:n)
   end subroutine
   subroutine sub2( n, m, x )
      integer, intent(in) :: n
      integer, intent(in) :: m
      real, intent(in)    :: x(m,*) !<-- assumed-size only in the last dimension; explicit-shape otherwise
      print *, x(1:m,1:n/m)
   end subroutine 
end module
   use m
   real, allocatable :: a(:)
   real, allocatable :: b(:,:)
   a = [ 0, 0, 1, 2, 0 ] 
   b = reshape( [( real(i), i = 1, 12 )], shape=[4,3] )
   call sub( n=2, x=a(3:4) ) !<-- say data of interest in only locations 3 and 4
   call sub( n=3*2, m=3, x=b(1:3,1:2) ) !<-- say data of interest are in 1:3, 1:2
end
C:\temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.7.0 Build 20220726_000000
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.33.31630.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
 1.000000 2.000000
 1.000000 2.000000 3.000000 5.000000 6.000000
 7.000000
1 Like

Ref: https://j3-fortran.org/doc/year/18/18-007r1.pdf

See section C.10.6 Rules ensuring unambiguous generics (15.4.3.4.5), page 572, paragraph 5:
“Rank is a fixed property, but some forms of array dummy arguments allow rank mismatches when a procedure is referenced by its specific name. In order to allow rank to always be usable in distinguishing generics, such rank mismatches are disallowed for those arguments when the procedure is referenced as part of a generic.” (emphasis mine)

This is what comes into play when you attempt to reference the procedure via the generic and you “provide a pointer to the first location” - this is not allowed by the standard as stated above.

1 Like

@shahmoradi ,@RonShepard, @FortranFan, amazing support you all, thank you! This is a great forum.

This has cleared all my doubts.

  • @FortranFan, the standard says (in hindsight, makes a lot of sense) no ambiguous cases can ever be allowed in generic interfaces. Referencing assumed-size arrays (*) with a “scalar” pointer to the first location is one of them, hence, clearly a bad API design decision regardless of any performance concerns.

  • @shahmoradi, nice to see there is no performance loss at low sizes (n<10), I would bet that is the area where passing one more argument would whey the most (btw: congrats on an awesome software!)

  • @RonShepard, the use case is to pack “everything-to-an-array” for easy MPI comms and binary read/write, imagine types containing other nested types, and variable-size integer/real/logical data, so my initial idea was to “do whatever I could” to keep things simple and efficient → “use a pointer”. But I agree it wasn’t a great choice. At the end of the day, MPI or I/O latency is going to be orders of magnitude larger than whatever performance may be lost passing more complete data structures.

So my new API is now like (looking forward to publish this library on github when ready):

! Expand object from a real array
pure subroutine my_routine(x,real_array)
class(packageable), intent(out) :: x
real(real64), intent(in) :: real_array(:)

Instead of using the pointer, I slice the array through the end:

call my_routine(...,array(first_location:))

Still no need to know its total size a priori, and bounds checks are now available.

1 Like