Exploring pointers in Fortran: When and how to use them

Hello fellow Fortran enthusiasts,

I’ve been diving deep into Fortran programming lately, and one topic that has caught my attention is the use of pointers in Fortran. While Fortran is renowned for its robust array abstractions and efficiency in numerical computing, I’m curious about scenarios where pointers might come into play.

From what I understand, in typical mathematical computing tasks, direct manipulation of memory addresses via pointers might not be necessary, thanks to Fortran’s powerful array operations.

For instance, consider a matrix-matrix multiplication kernel implemented in C:

void mxm(double *C, double *A, double *B, int N) {
  for (int i = 0; i < N; ++i) {
    for (int j = 0; j < N; ++j) {
      double *A_ptr = &A[i * N + 0];
      double *B_ptr = &B[0 * N + j];
      double *C_ptr = &C[i * N + j];
      for (int k = 0; k < N; ++k) {
        *C_ptr += *A_ptr * *B_ptr;
        ++A_ptr;
        B_ptr += N;
      }
    }
  }
}

While this approach may seem unconventional in Fortran, it mirrors a common pattern found in languages like C. Utilizing pointers and pointer arithmetic can potentially enable compilers to optimize computations more effectively by exposing specific compute patterns and enhancing memory locality.

I’m intrigued by the possibility of using pointers in scenarios such as systems programming, scientific computing, or when dealing with data structures like linked lists.

I’d love to hear from the community about their experiences with pointers in Fortran. Have you encountered situations where using pointers proved beneficial, even in the realm of scientific computing? Are there best practices or guidelines to follow when incorporating pointers into Fortran code?

Your insights and experiences would be greatly appreciated.

Best regards and nice to meet you all,
Iñaki :slight_smile:

The converse is also true for pointers. For example, in your code above, the executable statement within the nested loop must be executed serially, ensuring that the result memory location is updated before the next access to the input arrays, just in case the C[] array has any memory overlaps with either A[] or B[].

In fortran, the compiler would be free to optimize the loops because the array C(:,:) is not allowed to alias the other dummy arguments A(:,:) and B(:,:). This restriction is the responsibility of the programmer to enforce. That means, the compiler is free to make use of registers or GPU memory or any other optimization approach and the memory accesses and stores can be reordered and/or delayed as much as the compiler would like. This is not a recent change to the language, this alias restriction has been in fortran since before f77. More recent versions of C and other C-like languages have introduced pragmas and other workarounds to try to mimic the fortran convention and to allow similar optimizations, but that demonstrates the disadvantage of using wild pointers in languages.

Pointers in fortran can only point to objects with the target attribute. Thus, even when using pointers in fortran, the compiler already knows more about the optimization possibilities than in a wild-pointer language.

Fortran also has allocatable arrays that can be used in many situations that would require pointers in other languages, including data structures such as trees and linked lists. This also avoids the wild-pointer alias problems when using these data structures in fortran.

1 Like

Here is the equivalent Fortran code:

subroutine mxm(C, A, B, N) {
  integer :: N
  double precision :: A(N,N), B(N,N), C(N,N)
  integer :: i, j, k
  do i = 1, N
    do j = 1, N
      do k = 1, N
        C(j,i) = C(j,i) + A(k,i)*B(j,k)
      end do
    end do
  end do 
end subroutine 

It’s now pretty obvious that this is a matrix-matrix multiplication. In comparison the C code is obfuscated by the pointer tricks. And I bet that the Fortran version is at least as fast.

Actually both codes are not optimal, as the memory pattern access on the matrix is non-contiguous on the matrix B, resulting in more cache misses. The right way is:

subroutine mxm(C, A, B, N) {
  integer :: N
  double precision :: A(N,N), B(N,N), C(N,N)
  integer :: i, j, k
  do i = 1, N
    do k = 1, N   ! swapped j and k loops
      do j = 1, N
        C(j,i) = C(j,i) + A(k,i)*B(j,k)
      end do
    end do
  end do 
end subroutine 

Or better:

subroutine mxm(C, A, B, N) {
  integer :: N
  double precision :: A(N,N), B(N,N), C(N,N)
  integer :: i, k
  do i = 1, N
    do k = 1, N 
      C(:,i) = C(:,i) + A(k,i)*B(:,k)
    end do
  end do 
end subroutine 

Compilers are very good at optimizing such loops, no need of pointers.

In Fortran the explicit usage of pointers is generally reserved to cases where it’s difficult or impossible to do without them. And the motivation for their use is never the performances (F77 had no pointer at all, and was the fastest man in the west :slight_smile: ).

4 Likes

This also could have been an associate so I don’t know if it’s the best example, but I found this faster than having separate arrays q, k, v that get calculated separately.

real(kind=wp), target :: qkv(emb_dim+2*kv_head_size)
real(kind=wp), pointer :: q(:), k(:), v(:)
...
! store w%wqkv in one block and process in one loop
! can be parallelized  
do ix = 1,size(qkv)
qkv(ix) = dot_product(xb,(w%wqkv(:,ix,l)))
end do
 
! index with pointers                       
q => qkv(1:emb_dim)
k => qkv((emb_dim+1):(emb_dim+kv_head_size))
v => qkv((emb_dim+kv_head_size+1):(emb_dim+2*kv_head_size))
1 Like

if you must use pointers, make sure if you are using pointer remapping to point to a contiguous block of memory (say the ni*nj planes of a 3D array a(ni,nj,nk)) that you give the pointer the contiguous attribute

real, target :: a(ni, nj, nk)
real, contiguous, pointer :: b(:)
real :: d(ni*nj)
Do k= 1,nk
    b(1:ni*nj) => a(:,:,k) ! use b as a temp array to unroll i,j dims get longer vectors
    d = 0.5*b
  ! use d for something
end do

1 Like

Test comment

This linked thread discusses one way pointers are often used in unsteady partial differential equation solvers. However, seems like the best approach to the problem discussed uses move_alloc instead of pointers.

New to Modern Fortran myself, I sometimes find myself reaching for a pointer when associate or allocatable would do the job. My newbie naive approach at this phase of my learning is to try to not use pointers reflexively.

The associating entity (i.e. the name) you get from ASSOCIATE is essentially a pointer, but one that cannot change its association once it is established (kind of “constant pointer”), and has a lifetime visible to the compiler.

3 Likes

Hello again,

First, I wanted to take a moment to express my sincere gratitude to each of you for contributing to this discussion :-).

I agree, indeed. The explicit management of memory addresses through pointers in languages like C necessitates careful consideration to avoid potential memory conflicts. Plus, this fact may leave little room for the compiler to do its optimizing magic. In contrast, Fortran’s array semantics offer a powerful alternative that balances performance and safety.

The fact that Fortran compilers can optimize loops more aggressively due to the absence of aliasing between array dummy arguments is a testament to the language’s design with performance in mind.

I appreciate your perspective on this matter and fully agree that Fortran’s adherence to safer defaults from the beginning sets it apart as a language optimized for HPC and scientific computing.

It’s quite amusing how a seemingly straightforward operation like matrix-matrix multiplication can appear convoluted in some languages due to intricate pointer manipulations :grin:.

True!! I am glad you pointed this out.

:grin:

Your suggestion to mark pointers as contiguous when pointing to a contiguous block of memory raises an interesting point. Could you please elaborate on why it’s important to mark the pointer as contiguous in such scenarios?

I can see 2 reasons :

The compiler can apply more aggressive optimizations when dealing with contiguous data. for instance:

do i = 1, n
   y(i) = y(i) + a * x(i)
end do

If x and y are contiguous, the compiler can generate SIMD instructions. I think that there exists now instructions to load/store non contiguous data into/from the SIMD registers, but which are much less efficient.

And in the case you are calling a routine with assumed size or explicit shape dummy array arguments, the compiler has to generate an temporary copy of the actual array arguments if they are not contiguous.

Overall, specifying that an array pointer is contiguous when it’s the case makes the task easier for the compiler.

1 Like

Echoing @PierU’s response the contiguous attribute is just a way to tell the compiler that you expect the data pointed to will be contiguous in memory. Note CONTIGUOUS can also be applied to non-pointer procedure arguments as a way of telling the compiler to expect the data an argument is associated with to be contiguous in memory. Again, the biggest advantages is it helps the compiler decide not to make a temporary array and enables vectorization in cases where the aliasing rules for pointers etc might otherwise prevent it.

1 Like

Today I learned :smile:

And again, thank you very much for your time.