# Accessing the stride multiplier of an assumed-shape array

Consider the following group of `?axpy` subroutines from BLAS (vector-scalar product and add):

``````call saxpy(n, a, x, incx, y, incy)   ! F77-style
call daxpy(n, a, x, incx, y, incy)   ! F77-style

call axpy( x, y [, a])               ! BLAS F95 convention
``````

According to the Intel MKL documentation,

The Fortran 95 interface to BLAS and Sparse BLAS Level 1 routines is implemented through wrappers that call respective FORTRAN 77 routines.

I am wondering how is this possible without making array copies (and without using Fortran array descriptors from F2018)?

I’ve found one answer on Stack Overflow, Obtaining the stride of an Fortran Array, that suggests the non-standard extensions `loc` and `sizeof` can be used to recover the stride multiplier. Here’s a sketch of the solution using this language hack:

``````   subroutine blas95_saxpy( x, y, a )
real(sp), intent(in) :: x(:)
real(sp), intent(inout) :: y(:)
real(sp), intent(in), optional :: a

integer :: n, incx, incy
real(sp) :: a_

external :: saxpy ! F77 MKL routine

a_ = 1.0_sp
if (present(a)) a_ = a

incx = 1
incy = 1

n = size(y)

if (n > 1) then
incx = (loc(x(2)) - loc(x(1))) / sizeof(x(1))
incy = (loc(y(2)) - loc(y(1))) / sizeof(y(1))
end if

call saxpy(n, a_, x(1), incx, y(1), incy)

end subroutine
``````

Are there any other possibilities?

Here’s my attempt at a solution using C-Fortran array descriptors:

``````// f18_saxpy.c

#include <ISO_Fortran_binding.h>
#include "mkl.h" // Intel MKL

void f18_saxpy(CFI_cdesc_t *x, CFI_cdesc_t *y, float *a) {

int n, incx, incy;
float a_;

a_ = 1.0;
if (a)
a_ = *a;

n = y->dim[0].extent;

incx = 1;
incy = 1;

// Ideally, cast to CFI_index_t instead of int,
// but the standard BLAS interface uses int's for indexing
if (n > 1) {
incx = (int) ( x->dim[0].sm / x->elem_len );
incy = (int) ( y->dim[0].sm / y->elem_len );
}

}
``````

And here’s a driver program to test the routines:

``````! test_axpy.F90

module blas95

implicit none
private

public :: axpy

integer, parameter :: sp = kind(1.0e0)

interface axpy
#ifdef CFI
subroutine f18_saxpy( x, y, a ) bind(c,name='f18_saxpy')
import sp
real(sp), intent(in) :: x(:)
real(sp), intent(inout) :: y(:)
real(sp), intent(in), optional :: a
end subroutine
#else
module procedure blas95_saxpy
#endif
end interface

contains

subroutine blas95_saxpy( x, y, alpha )
real(sp), intent(in) :: x(:)
real(sp), intent(inout) :: y(:)
real(sp), intent(in), optional :: alpha

integer :: n, incx, incy
real(sp) :: alpha_

external :: saxpy

alpha_ = 1.0
if (present(alpha)) alpha_ = alpha

incx = 1
incy = 1

n = size(y)

if (n > 1) then
incx = (loc(x(2)) - loc(x(1))) / sizeof(x(1))
incy = (loc(y(2)) - loc(y(1))) / sizeof(y(1))
end if

call saxpy(n, alpha_, x(1), incx, y(1), incy)

end subroutine

end module

program test_axpy

use blas95, only: axpy
implicit none (type, external)

integer, parameter :: sp = kind(1.0e0)

real(sp) :: x(6), y(6), alpha
integer :: i

y = 0.0
x = [1.0,2.0,3.0,4.0,5.0,6.0]

call axpy( x, y )
write(*,'(6(F11.6,4X))') y

y = 0
alpha = 3.0
call axpy( x(1::2), y(::2), alpha )
write(*,'(6(F11.6,4X))') y

y = 0
alpha = 2.0
call axpy( x(2::2), y(2::2), alpha )
write(*,'(6(F11.6,4X))') y

y = 0
call axpy( x(2::3), y(2::3), 1.0 )
write(*,'(6(F11.6,4X))') y

y = 0
call axpy_second_element( x, y, 4.0 )
write(*,'(6(F11.6,4X))') y

y = 0
call axpy_second_element( x(::2), y(::2), 4.0 )
write(*,'(6(F11.6,4X))') y

contains

! Subroutine used to test sub-slicing
subroutine axpy_second_element( x, y, alpha )
real(sp), intent(in) :: x(:)
real(sp), intent(inout) :: y(:)
real(sp), intent(in), optional :: alpha

call axpy( x(::2), y(::2), alpha )

end subroutine

end program
``````

I’ve only tested this with the Intel Fortran compiler (due to the convenient `-qmkl` flag), and following command:

``````\$ icc -qmkl -c -Wall f18_saxpy.c
\$ ifort -DCFI test_axpy.F90 f18_saxpy.o -qmkl -o test_axpy
``````

Omit the `-DCFI` flag to test the Fortran implementation.

Here are instructions for the GNU compilers:

1. Download `cblas.tgz` from the BLAS page at Netlib
2. Extract the files to a folder, and locate `cblas.h` and `cblas_f77.h` from the `include/` folder, and `cblas_saxpy.c` in the `src/` folder. Copy the files into the folder containing the `test_axpy.F90` driver.
3. Replace `mkl.h` with `cblas.h` in the include section of `f18_saxpy.c`
4. Add interoperable type declarations in the Fortran interface to silence `gfortran` warnings:
``````   subroutine f18_saxpy( x, y, a ) bind(c,name='f18_saxpy')
use, intrinsic :: iso_c_binding, only: c_float
real(c_float), intent(in) :: x(:)
real(c_float), intent(inout) :: y(:)
real(c_float), intent(in), optional :: a
end subroutine
``````
5. Compile the CBLAS wrapper of the F77 routine, the `f18_saxpy.c` wrapper of the CBLAS wrapper, and finally the F95+ wrapper of the wrapper wrapper. Don’t forget to link with the system BLAS library.
``````\$ gcc -c -Wall cblas_saxpy.c
\$ gcc -c -Wall f18_saxpy.c
\$ gfortran -DCFI -Wall test_axpy.F90 f18_saxpy.o -o test_axpy -lblas
``````

Amusingly, we have gone through 3 layers of indirection to achieve something which can be done easily in Fortran:

``````   subroutine mf_saxpy( x, y, a )
real(sp), intent(in) :: x(:)
real(sp), intent(inout) :: y(:)
real(sp), intent(in), optional :: a

real(sp) :: a_

a_ = 1.0
if (present(a)) a_ = a

y = y + a_ * x
end subroutine
``````

@ivanpribec , what you show with your `f18_saxpy` wrapper is the cleanest Fortran standard-based option to work with the commonly recommended `assumed shape` dummy argument approach.

“any other possibilities” will be non-standard, you’ve already shown one though the same approach is presumably viable with `c_loc` and `c_sizeof`.

It is unfortunate the abstraction at this level is inadequate to some e.g., it doesn’t guarantee AVX instructions but the user wants to enforce that in certain situations.

Thank you @FortranFan for the confirmation. One concern I have is in the statement

``````incx = (int) ( x->dim[0].sm / x->elem_len );
``````

The `elem_len` is the storage size in bytes of an array element, while `sm` is the spacing between elements (the difference in bytes between successive elements). I presume that for intrinsic types (integer, real, complex), the available processors don’t have any padding between array elements. However I have been warned that for derived types this might not be that case. Is this interpretation correct?

It feels a bit odd, the standard would allow you to do something in C, but have no way to do it in Fortran. It feels almost if a bar was set, so that only users with sufficient knowledge of C (and hence a good understanding of pointers) would be able to tinker with strided access. The average Fortran programmer on the other hand, has to rely on the compiler making a contiguouos copy to get the job done.

Good question, one would have to grok the compiler “manuals” on storage semantics and conformance vis-a-vis the standard, etc. to be absolutely sure but you’re generally right - padding shouldn’t be a factor with intrinsic types.

Moreover with interoperable derived types also where each type component is by itself interoperable with the companion C processor, what you show with the relationship between the memory stride and the storage size in bytes of each element of the object should also hold i.e., any padding should get factored appropriately into the storage size provided by the C descriptor.