What's the purpose of array size inside subroutine arguments?

Personally, I think this rule is too strong, and both assumed-size and assumed-shape arrays have valid use-cases.

When it comes to explicit-size array arguments (including assumed-size)

  • the known length and array contiguity can offer better optimization opportunities
  • very commonly used for interoperability with C APIs

Just to give you a simple example:

subroutine sqrsum(a,b,c)
real, intent(in) :: a(:), b(:)
real, intent(out) :: c(:)
c = a**2 + b**2
end subroutine

subroutine sqrsumn(n,a,b,c)
integer, intent(in) :: n
real, intent(in) :: a(n), b(n)
real, intent(out) :: c(n)
c = a**2 + b**2
end subroutine

When compiled with gfortran -O2 -march=skylake the instructions produced are:

  1	sqrsum_:                                  | sqrsumn_:
  2	        push    rbx                       |         movsx   rdi, DWORD PTR [rdi]
  3	        mov     ebx, 1                    |         test    edi, edi
  4	        mov     r8, QWORD PTR [rdx+40]    |         jle     .L5
  5	        mov     r10, QWORD PTR [rdi+40]   |         sal     rdi, 2
  6	        mov     r9, QWORD PTR [rsi+40]    |         xor     eax, eax
  7	        mov     r11, QWORD PTR [rdi+56]   | .L3:
  8	        test    r8, r8                    |         vmovss  xmm1, DWORD PTR [rdx+rax]
  9	        mov     rax, QWORD PTR [rdx]      |         vmovss  xmm0, DWORD PTR [rsi+rax]
 10	        mov     rcx, QWORD PTR [rdi]      |         vmulss  xmm1, xmm1, xmm1
 11	        cmove   r8, rbx                   |         vfmadd132ss     xmm0, xmm1, xmm0
 12	        test    r10, r10                  |         vmovss  DWORD PTR [rcx+rax], xmm0
 13	        mov     rdx, QWORD PTR [rsi]      |         add     rax, 4
 14	        cmove   r10, rbx                  |         cmp     rdi, rax
 15	        test    r9, r9                    |         jne     .L3
 16	        cmove   r9, rbx                   | .L5:
 17	        sub     r11, QWORD PTR [rdi+48]   |         ret
 18	        js      .L11
 19	        sal     r10, 2
 20	        xor     esi, esi
 21	        sal     r9, 2
 22	        sal     r8, 2
 23	.L6:
 24	        vmovss  xmm1, DWORD PTR [rdx]
 25	        vmovss  xmm0, DWORD PTR [rcx]
 26	        mov     rdi, rsi
 27	        add     rcx, r10
 28	        inc     rsi
 29	        add     rdx, r9
 30	        vmulss  xmm1, xmm1, xmm1
 31	        vfmadd132ss     xmm0, xmm1, xmm0
 32	        vmovss  DWORD PTR [rax], xmm0
 33	        add     rax, r8
 34	        cmp     rdi, r11
 35	        jne     .L6
 36	.L11:
 37	        pop     rbx
 38	        ret

See how the assumed-shape version does extra register shuffling at the beginning, whereas the contiguous versions proceeds almost immediately into the computational loop. If you have lots of assumed-shape arguments this contributes to a higher register pressure, and some arguments may need to be pushed to the stack. In contrast the known-size version uses only four registers,

  • rdi for the address at which the common size n is stored,
  • rsi for the base address of array a
  • rdx for the base address of array b
  • rcx for the base address of array c

In the computational loop .L6, the assumed-shape version is 13 instructions, whereas the contiguous version has 8. The additional instructions track the element position using the registers rdx, rcx, and rax, as assumed-shape implies the arguments could have different strides.

When compiled with with -O3 -march=skylake -fopt-info, the message

/app/example.f90:4:15: optimized: versioned this loop for when certain strides are 1

says the compiler generated two code-paths, one optimized for the case when both arguments are contiguous, and a second one, using a sequential loop when one or both of the strides are larger than one.

NVIDIA also recommends avoiding assumed-shape arrays when it comes to GPU kernels (that is CUDA kernels, or functions called inside of OpenACC regions). For more details see the talk by David Appelhans on Best Practices for Programming GPUs using Fortran, OpenACC, and CUDA | NVIDIA On-Demand. Here is a snapshot taken from that video:


There are however very valid use-cases for assumed-shape arrays too. One of my favourites is for situations where both SoA and AoS data-structures are commonly used. Say you have some type of spatial search tree in 3-d:

subroutine search(x,y,z, ...)
real, intent(in) :: x(:), y(:), z(:)
! ...
end subroutine

By using assumed-shape arrays, users can do the following,

type :: point
   real :: x, y, z
end type

type :: point_collection(n)
   real, len :: n
   real :: x(n), y(n), z(n)
end type

type(point), allocatable :: pa(:)
type(point_collection(n=:)), allocatable :: pc

! ... initialize pa and pc

call search(pa%x,pa%y,pa%z, ...)  ! AoS

call search(pc%x,pc%y,pc%z, ...)  ! SoA

The code-path generated for the AoS layout won’t be optimal, but at least it doesn’t make a temporary copy, and the locality of the data accesses remains good.

Edit: for anyone seeking to combine two files for a side-by-side comparison, I used this script:

#!/usr/bin/env bash
# Combines two lines side-by-side and numbers them
# Usage: combine.sh <file1> <file2>
nl -w3 <(paste $1 <(awk '{print "| "$0}' $2) | (column -s $'\t' -t))
6 Likes