`target` attribute seemingly affecting performances

PierU · January 29, 2025, 1:58pm

Not necessarily, and it all depends on what you are doing with that pointer afterwards. Note that the alternative is an ASSOCIATE statement:

! p does not have to be declared beforehand
associate( p => var%a%b(2)%c%d )
   ...
end associate

It looks similar but it’s definitely not, as p is not a pointer here.

ricriv · January 29, 2025, 2:05pm

Thanks

themos · January 29, 2025, 3:20pm

It is mostly a social problem. The person first introducing a POINTER p for a scalar intrinsic type may have checked that performance did not suffer with the available compilers. But what would a subsequent maintainer do? Would they think that there is a deep reason why POINTER was used instead of a temporary? Would they be tempted to use more POINTERs? Would they check performance effects? Soon, you might end up with code riddled with POINTERs that nobody could account for.

RonShepard · January 29, 2025, 5:00pm

The matmul() intrinsic is overloaded so that it works with both 2D and 1D arrays. This allows matrix-matrix, matrix-vector, vector-matrix, and vector-vector (outer products) to be done through that same generic matmul() function reference. But if one of the matrices in the matmul() call is really a (1:1,1:N) 2D array, then the specific function with the matrix argument is used instead of the specific function with the vector argument. Consequently, there could be a significant difference in timings with a pointer argument like pt2=>array(1:1,1:N) rather than pt1=>(1,1:N). Numerically the computed results will be the same regardless. Also, the matrix-version of matmul() might detect that its argument is really a vector and do a runtime branch to the appropriate vector-specific function. The same considerations would apply also to an actual argument being array(1,1:N) or array(1:1,1:N) or to the analogous situation within an ASSOCIATE block. So there are a lot of uncertainties in the timings and the performance with all of these possibilities

ivanpribec · January 29, 2025, 10:23pm

I noticed this recently for the Level 2 BLAS SGER/DGER operation which computes the rank-1 update:

A = A + \alpha x y^T

The matrix A is m \times n, the vector x is m \times 1 and the vector y is n \times 1. In Fortran there are multiple ways to perform this (assuming you don’t just use BLAS):

program rank1_update
implicit none
real :: A(3,4)
real :: x(3), y(4)
real :: alpha
integer :: i

alpha = 1.0
x = [1,2,3]
y = [1,2,3,4]

A = 0
call sger_dc(alpha,A,x,y)
print '(4(F8.3))', (A(i,:), i=1,3)
print '()'

A = 0
call sger_arr(alpha,A,x,y)
print '(4(F8.3))', (A(i,:), i=1,3)
print '()'

A = 0
call sger_mm(alpha,A,x,y)
print '(4(F8.3))', (A(i,:), i=1,3)
print '()'

A = 0
call sger_reshape(alpha,A,x,y)
print '(4(F8.3))', (A(i,:), i=1,3)

contains

    ! loops
    subroutine sger_dc(alpha,a,x,y)
    real, intent(in) :: alpha, x(:), y(:)
    real, intent(inout) :: a(:,:)
    integer :: i, j
    do concurrent(i=1:size(a,1),j=1:size(a,2))
        a(i,j) = a(i,j) + alpha * x(i) * y(j)
    end do
    end subroutine

    ! array syntax
    subroutine sger_arr(alpha,a,x,y)
    real, intent(in) :: alpha, x(:), y(:)
    real, intent(inout) :: a(:,:)
    a = a + alpha * spread(x,2,size(a,2)) &
                  * spread(y,1,size(a,1))
    end subroutine

    ! matmul
    subroutine sger_mm(alpha,a,x,y)
    real, intent(in) :: alpha
    real, intent(in), target :: x(:), y(:)
    real, intent(inout) :: a(:,:)
    real, pointer :: x2(:,:), y2T(:,:)
    x2(1:size(x),1:1) => x
    y2T(1:1,1:size(y)) => y
    a = a + alpha * matmul(x2,y2T)
    end subroutine

    ! reshape
    subroutine sger_reshape(alpha,a,x,y)
    real, intent(in) :: alpha
    real, intent(in) :: x(:), y(:)
    real, intent(inout) :: a(:,:)

    associate(x2 => reshape(x,shape=[size(x),1]), &
              y2 => reshape(y,shape=[size(y),1]))
        a = a + alpha * matmul(x2,transpose(y2))
    end associate

   end subroutine
end program

Guess which one is the most efficient in practice? The loop one; for other variants the compilers create temporary arrays

Edit: I was curious how the generated procedures differ in size (or how bloated are they)

$ gfortran -Wall -O3 -march=skylake -fno-inline -c rank1_update.f90 
$ nm --print-size --size-sort --radix=d rank1_update.o | grep "sger" | awk '{print $2, $4}'
0000000000000193 sger_dc.3.constprop.0.isra.0
0000000000000197 sger_mm.1.constprop.0.isra.0
0000000000000698 sger_arr.2.constprop.0.isra.0
0000000000003307 sger_reshape.0.constprop.0.isra.0

(note that gfortran performs constant propagation)

$ ifort -warn all -O3 -xSKYLAKE -inline-level=0 -c rank1_update.f90 
$ nm --print-size --size-sort --radix=d rank1_update.o | grep "IP_sger" | awk '{print $2, $4}'
0000000000000256 rank1_update_IP_sger_dc_
0000000000000272 rank1_update_IP_sger_arr_
0000000000000368 rank1_update_IP_sger_mm_
0000000000001232 rank1_update_IP_sger_reshape_

certik · January 29, 2025, 10:33pm

I guessed the loop one, since it’s the most straightforward for the compiler to implement. However, I think with good optimizations I don’t see why the “array syntax” couldn’t be rewritten by the compiler to be equivalent. I think the matmul can also in principle be rewritten. This is something that as a user I always wanted the compiler to do. I want to attempt these kinds of optimizations with LFortran after we reach beta.

ivanpribec · January 29, 2025, 10:50pm

This rank-1 operation occurs in LU factorization (trailing panel update):

  real :: A(m,n)
  integer :: i, j, k

  ! ...
  ! k = pivot

  do j = k+1,n
      do i = k+1,m
          A(i,j) = A(i,j) - A(i,k)*A(k,j)
      end do
  end do

This loop is quite tricky for a compiler and not straightforward to vectorize. But it is possible by using do concurrent or an array expression (in principle). Alternatively, the trick is to use the procedural boundary as a means of stating “no aliasing here”:

  lda = m
  call sger(m-k,n-k,-one,a(k+1,k),1,a(k,k+1),lda,a(k+1,k+1),lda)

Topic		Replies	Views
Behaviour of pointer to allocatable array with SAVE attribute Help	13	1153	April 21, 2022
Trying to understand how intent(in) affects pointers and targets Help	1	354	March 3, 2023
Pointer remapping and ASSOCIATED	1	549	March 11, 2022
Meaning of the intent for pointer dummy arguments	11	1392	November 30, 2021
Dummy pointers to actual non-pointers allocatable targets Help	16	554	February 19, 2023

`target` attribute seemingly affecting performances

Related topics