Speed test: equivalence vs. transfer vs. reshape

Hello everyone,

For my first post, I thought I’d post the results of a small experiment I ran today and ask for some help interpreting the results.

Due to the nature of the project I’m working on, I’ve been spending a fair bit of time reshaping arrays. I’m doing it enough that I thought I’d figure out if I could spend less time reshaping arrays so I thought I’d compare three ways of changing the data access topology: the reshape() intrinsic, the transfer() intrinsic, and the equivalence statement. The source is included at the end of this post.

I compiled with gfortran using only the -O3 flag (compiler optimization settings did not seem to affect the rankings here). In each case, I ran the program using the “time” command to help to give me assurance that I wasn’t hiding computing work from my cpu_time() calls.

Here are my results:

$ time ./a.out #equivalence
 Elapsed time:    3.0000000000030003E-006  sec

real    0m0.972s
user    0m0.096s
sys     0m0.869s

$ time ./a.out #transfer
 Elapsed time:   0.28676800000000002       sec

real    0m1.290s
user    0m0.159s
sys     0m1.082s

$ time ./a.out #reshape
 Elapsed time:   0.13298499999999999       sec

real    0m1.099s
user    0m0.115s
sys     0m0.966s

I expected equivalence to beat the pants off the other two since it’s not a function call but a memory addressing instruction, and indeed it did. What surprised me was that even reshape was faster than transfer for this application.

This leads me to the question: WHY is transfer() preferred in modern Fortran over equivalence when equivalence is so much faster, especially on large datasets and avoids making a copy of the underlying data, which transfer() seems to do?

What’s so bad about equivalence that it is being treated as archaic, when it seems to perform a useful function that would otherwise not be available in Fortran and can avoid copying large datasets if one doesn’t want to? Shouldn’t there be room for both transfer() and equivalence in the toolbox?

EDIT: A minor update: I found some stackoverflow posts about linear indexing of multidimensional arrays using pointers, which is probably better than using equivalence for this purpose. I’d expect the pointers to work as quickly as equivalence.

Source:

program transfer_test
    implicit none
    integer :: square_array(10**4,10**4), lin_array(10**8)
    real(8) :: t1,t2
    !! UNCOMMENT THE APPROPRIATE LINE
    ! equivalence (lin_array, square_array)
    square_array = 1
    call cpu_time(t1)
    ! lin_array = transfer(square_array,square_array)
    ! lin_array = reshape(square_array,[10**8])
    call cpu_time(t2)
    print *, "Elapsed time: ", t2-t1, " sec"
    ! write to make code observable
    open(50,file='.dummyfile',status='replace',access='stream')
    write(50)lin_array
    write(50)square_array
    close(50)
end program transfer_test

Thanks,
David

3 Likes

@DavidC , welcome to the forum.

Please note no, contrary to whatever information you may have received, the truth is absolutely that TRANSFER is different from than EQUIVALENCE. TRANSFER should never ever be seen as equivalent (yes, poor choice of pun!) to EQUIVALENCE:

  • the semantics of EQUIVALENCE can make it seem almost like a compile-time alias whereas TRANSFER is a run-time data copy. TRANSFER thus carries with it all the cost of the run-time data transfer from one memory location to another.

Given your original post and what you state is of interest to you in terms of data access, I suggest you seriously consider rank-1 arrays with TARGET attribute for your data storage and employ multidimensional (rank N; N > 1) or even other rank-1 objects with the POINTER attribute for your data access as needed.

   integer, parameter :: N = 3
   integer, target :: dat(N*N)
   integer, pointer :: mat(:,:)
   integer, pointer :: diag(:)
   dat = [( i, i=1,size(dat) )]
   mat(1:N,1:N) => dat
   diag => dat(1::N+1)
   print *, "Matrix:"
   do i = 1, size(mat,dim=1)
      print "(*(g0,1x))", mat(:,i)
   end do
   diag => dat(1::N+1)
   print *, "Diagonal elements: ", diag
end

C:\temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.5.0 Build 20211109_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.27.29112.0
Copyright (C) Microsoft Corporation. All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
Matrix:
1 2 3
4 5 6
7 8 9
Diagonal elements: 1 5 9

3 Likes

The short explanations for each are:

  • equivalence is an aliasing mechanism. I.e. these two things point to the same memory, no copy occurs period
  • reshape is an interpretation mechanism. I.e. consider this thing to have that shape, so no copy occurs on the right hand side, but the data is copied to the left hand side
  • transfer is a copy mechanism. I.e. copy the data in this thing, but consider the result to have the same type as this other thing. Thus this causes the data to be copied twice, once on the right hand side, and then again to copy to the left hand side

The pointer rank remapping example demonstrated by @FortranFan is the more modern equivalent to equivalence and does not involve the actual copying of data.

The pointer rank remapping example demonstrated by @FortranFan is the more modern equivalent to equivalence
This is an interesting statement to me. . . Can pointers do the “casting” style behavior that equivalence does? can I have an integer(8) pointer to a real(8) variable?

The only way I’m aware of is to c_loc to a c_ptr then c_f_pointer back.

1 Like

I don’t think so. Here’s an example which does not compile:

! pointer_remapping.f90
!
use, intrinsic :: iso_fortran_env, only: int64, real64
integer(int64), target :: a(10)
real(real64), pointer :: b(:) => null()
b(1:10) => a  
    ! Error: Different types in pointer assignment at (1)
end

I’m curious to learn for what application would you like to use a cast? One usage I can think of is to transmit a stream of bits of a real array through a socket. Another use would be data serialization like the “pickle” module in Python.

I guess that one use case for “type punning” via pointers may be using part of a large chunk of memory allocated at the start of a program in a very legacy-style code (which I struggled with many years ago…) The latter code used a common block as a memory pool (whose actual memory is allocated via C), passes array elements like X(i1) and X(i2) to other subroutines, and then use those memory as work arrays. But if array pointers + c_f_pointer() are okay to use, the intent of the code may become much clearer (though using allocate should be much more readable, of course…)

1 Like

EDIT: As @kargl pointed out, this is incorrect. real(foo,16) upconverts with maximum precision in binary. This sometimes leaves nonzero digits in the decimal representation.

Well, I’ve needed to compare some data that was computed in double precision against a reference data set that is in quad (real(30)). In order to do that, I needed to append zeros to the end of the decimal representation of the double precision data… if you just use the real() intrinsic it attaches random bits which is unwanted. This happens in both gfortran and ifort afaik.

easiest way I found was to use equivalence to treat the double data as an integer and use bit twiddling stuff to upconvert the double. Transfer() works too but copies the data so it’s a little slower.

Why not simply compare whether the computed data agrees with the reference higher precision set within EPSILON of the floating-point type employed toward the computed data?

That’s a good thought! That would work like numpy’s allclose() function. However, it would measure something different from what I want, which is the difference between the two datasets. I compare against a threshold in other places in the project.

No, but that was always a very dangerous feature to begin with.

It looks like you’re absolutely right. I should have checked this further before writing–I can’t reproduce the behavior I claimed above. I think I got confused between the decimal and binary representations and reinvented the wheel. Thanks for catching that–it will probably speed up my code!

It seems like a general purpose language ought to have features for, e.g., manipulating memory at the bit-level, and I don’t know of another way to do it. Why is it dangerous for an authorized user to be able to access memory in this way?

See, e.g. memory pool.

EDIT: oops, didn’t notice @septc has answered this.Speed test: equivalence vs. transfer vs. reshape - #7 by septc

I guess if you want to do your own memory pool without equivalence or C pointer tricks, you need to allocate separate containers/arrays for different types (reals, ints, characters, …). The Wikipedia page states that memory pools are a common practice in real-time systems. I’d be happy to learn that Fortran is still used for any of these.

Another solution if this is truly needed is to have the main program in C allocate a big chunk of memory and aftwerward initialize Fortran array descriptors using the machinery in "ISO_Fortran_binding.h". All the actual work can then be implemented in Fortran subroutines called from the main program.

Fortran is not a “general-purpose” (i.e. systems programming) language. It is a language to describe computation. It has no notion of memory addresses, indeed no notion of “memory”. There are names that may or may not have defined values associated with them. Most of the security holes in software in the last decades is from allowing systems programmers to access memory addresses, and assuming they know how to do that safely.

3 Likes

Real-time application is just one example. Using pool or arena in memory management in scientific software is fairly common, and any such a tool with a reasonably friendly API would provide type-agnostic memory access. See for example Umpire, in particular its allocator doc. Two common applications:

Allowing equivalence between variables of two different types was used (most of the times I’ve seen it) to save space. I.e. I don’t need these two variables at the same time, so I can use the same memory for them both. But as a code evolves these kinds of assumptions end up being violated, and then you have a weird and hard to trace down bugs. There is not (usually) a meaningful correspondence between the bit representation of a real and that of an integer and so making two variables of different type equivalent doesn’t really make sense, or is (almost certainly) not portable. There are bit level intrinsic procedures in the standard, but not any “raw bits” type. Perhaps that should be proposed? In fact I believe it may have been at some point.

@DavidC,

If you want to measure “the difference between the two datasets” where one set is in floating-point precision of, say, X and the other dataset of precision Y, then the only measurement of “difference” that would make sense is one where both the sets are based on the same lower precision e.g.,

..
! Say you have
   real(R8) :: x
   real(R16) :: x_ref
   real(R8) :: delta_x
   ..
   delta_x = x - real( x_ref, kind=kind(delta_x) )
   ..
! Now employ delta_x toward the measurement
1 Like