Is there an intrinsic way to transpose N-dim arrays?

I’d like to tranpose arrays to get some data from C-style to Fortran-style ordering. For a rank-2 array I can use the intrinsic transpose. But how best to do it for higher ranks? The intrinsic reshape offers the order optional argument, so I tried, for example for rank-4:

x = reshape(x, shape=reverse(shape(x)), order=[4, 3, 2, 1])

and reverse is a function that reverses the order of elements in a rank-1 array.

However, the value of shape must be a constant expression, and I don’t know the shape of x at compile time, so I can’t use this approach.

Currently, I have:

pure function reverse_dim_order(x) result(res)
  real, intent(in) :: x(:,:,:,:)
  real, allocatable :: res(:,:,:,:)
  integer :: dims(4)
  integer :: i, j, k, l
  dims = shape(x)
  allocate(res(dims(4), dims(3), dims(2), dims(1)))
  do concurrent(i = 1:dims(1), j = 1:dims(2), k = 1:dims(3), l = 1:dims(4))
    res(l,k,j,i) = x(i,j,k,l)
  end do
end function reverse_dim_order

which works, but I’m wondering is there a more intrinsic approach to this that I could take? Performance is not critical, so I’d like to take an approach that’s as concise, intrinsic, and Fortrannic as possible.

3 Likes

I don’t think that is correct. Metcalf, Reid, and Cohen (2018) say only that

The size of shape must be constant, and its elements must not be negative.

and the following code compiles and runs:

program main
implicit none
real, allocatable :: x(:,:)
allocate (x(2,3))
call random_number(x)
print*,"shape, sum=",shape(x),sum(x)
x = reshape(x,shape=[size(x,2),size(x,1)])
print*,"shape, sum=",shape(x),sum(x)
end program main

So I think you could write separate transpose functions using reshape with an order argument that work for 3D arrays, 4D arrays, etc.

3 Likes

Indeed, you’re right, I wasn’t reading the rules carefully.

I have settled with this solution. Not as elegant as I’d like it but I think it’s as intrinsic as we can get.

Are there any optimizations that can be done for these kinds of transpose operations? For example, if instead of working on entire rows and columns (and their generalization for higher dimension arrays), does it make sense to operate on small subblocks of the arrays? The goal would be to minimize cache misses, or page faults, and such. This could be done by writing explicit loops and then strip-mining the assignment statements.

Well there are always optimisations to be made. The real question is if one should bother. Unless these multi-D transpose operations take a substantial amount of time or consume an unsustainable amount of memory there really is no point.

I think this common phrase is applicable in this case “If it ain’t broke don’t fix it.”

Is transpose what is required to get some data from C-style to Fortran-style ordering ?
Does the DO CONCURRENT transpose x or merely reshape x ?
For xt to be the transpose of x, xt(j,i) = x(i,j).
Does reshape change the memory order of the elements of x

To go from “C style” to “Fortran style”, isn’t all that is required is to change the subscript order when referencing an element of the array or when dimensioning in Fortran or C code ?
Just reference the contiguous elements of the array correctly for the language.

I think in both the cases with TRANSPOSE() and RESHAPE(), they are normal functions that return results in the normal way, namely a temporary array is generated and the results are placed in that temporary array. However, if you look at optimized code, or if you use C_LOC() to look at memory addresses of the array elements, you wil see that the compiler often times can avoid the memory allocation and storage step by doing something fancy with the array metadata. You can think of this as a “shallow transpose” or a “shallow reshape” kind of operation. However, at present, the programmer has no direct access to that functionality, it is just up to the compiler to recognize it and to do it when it thinks it is appropriate.

I think it would be a good idea if such operations were exposed and made available to the programmer. For example, an operation like B=>SHALLOW_TRANSPOSE(A) would return a pointer to the memory occupied by the original A, but with the array metadata adjusted so that B(i,j) and A(j,i) reference the same memory location.

I think this would be useful for array transpose, reshape, and for complex arrays the conjugate and conjugarte transpose operations. The key thing is that the operation would take O(1) time, not O(M*N) time, for an MxN array. Most optimizing compilers already do this in special cases, it would be just a matter of exposing this capability directly to the programmer.

1 Like

I think it would be possible for compilers to create an array pointer having “opposite” index order, in principle. But if those arrays are passed to assumed-shape dummy arrays in an existing routine, I am afraid the programmer needs to take additional care of possibly different index orders for contiguous access (e.g., by calling “is_fortran_order()” etc and determining which of “Fortran” or “C” order for a particular array), and then prepare different do-loops that best match the contiguous order for a given descriptor. I guess this may be cumbersome possibly… (though the idea of transposed pointer seems useful).

As a separate optimization step, the subroutine could well check for array strides and try to optimize based on that. As I said before, the compiler is already doing this in some cases, so maybe that part is already happening and we don’t notice it.

It should be possible to do the transpose in three lines using reshape, automatic left-hand-side reallocation and the Fortran 2018 rank intrinsic.

associate(shape => shape(array))
  array = reshape(array, shape=shape(rank(array):1:-1), order=[(i, i=rank(array), 1, -1)])
end associate

If we could index results of a function evaluation it would be just a single line

array = reshape(array, shape=shape(array)(rank(array):1:-1), order=[(i, i=rank(array), 1, -1)])
4 Likes
associate(s => shape(array), n => rank(array))
  array = reshape(array, shape=s(n:1:-1), order=[(i, i=n,1,-1)])
end associate

It is interesting that if I try to use a short-hand symbol like n above, the compilation fails again for the same error (constant size)…

   18 |   array = reshape(array, shape=s(n:1:-1), order=[(i, i=n,1,-1)])
      |                               1
Error: 'shape' argument of 'reshape' intrinsic at (1) must be an array of constant size

Right, the example by @awvwgk doesn’t work for the same reason that my first snippet in the top post doesn’t work, and as @Beliavsky pointed out in his response. If this had worked, the rank of the result of reshape couldn’t be determined at compile-time, and as I understand it, the constant expression requirement for the shape argument to reshape is there to ensure that.

Both GFortran and Intel Fortran seem to compile it just fine.

❯ cat transpose.f90
implicit none
integer :: i
real, allocatable :: array(:, :, :, :)

allocate(array(5, 6, 2, 4))
call random_number(array)
print *, shape(array)

associate(shape => shape(array))
  array = reshape(array, shape=shape(rank(array):1:-1), order=[(i, i=rank(array), 1, -1)])
end associate
print *, shape(array)
end
❯ gfortran transpose.f90 && ./a.out
           5           6           2           4
           4           2           6           5
❯ ifort transpose.f90 && ./a.out
           5           6           2           4
           4           2           6           5

Just be careful which variables you use, size(shape) instead of rank(array) doesn’t work for example.

Edit: At least NAG declares it as extension

❯ nagfor transpose.f90 && ./a.out 
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7111
Extension(F2018): transpose.f90, line 13: RANK intrinsic procedure
[NAG Fortran Compiler normal termination, 1 warning]
 5 6 2 4
 4 2 6 5
3 Likes

Indeed, that’s what threw me off. I was mistakingly assigning rank(array) to a variable or associate name, which doesn’t work because they’re not constant. But rank(array) can be assigned to a parameter, which makes the shape argument a constant expression. Nice “trick”!

Thinking a bit more about rank() and whether its result is a constant expression. Curiously, ifort 2021.4 doesn’t compile this:

$ cat rank.f90 
integer, allocatable :: array(:,:,:)
integer, parameter :: n = rank(array)
end
$ ifort rank.f90 
rank.f90(2): error #7747: Invalid parameter expression.   [RANK]
integer, parameter :: n = rank(array)
--------------------------^
compilation aborted for rank.f90 (code 1)

but GFortran does. Who is correct?

From my interpretation of 10.1.12 in 18-007r1, array in this case is not a constant expression, so neither is rank(array). If my interpretation is correct, then the example by @awvwgk isn’t conforming either? But I don’t see why it shouldn’t be, as rank(array) can be determined at compile-time.

For whatever it’s worth, I think the code conforms and IFORT has a bug here.

The expression that has IFORT stumble is a specification inquiry using a standard intrinsic inquiry function where the argument is a variable whose properties are not assumed, deferred, or defined by an expression that is not a constant expression. The standard permits this in a constant expression.

Fyi I’ve turned in a support request with Intel OSC on this.

1 Like

Performancewise, I think the combination of all those intrinsics may reduce chances for compiler optimization. Array halding is exactly about these pointer calculations, and I bet all fortran compilers are excellent at that. So I’d rather leave this thing more explicitly on the loop side, to leave the compiler unleash its optimization strategies. See this version:

program test_reshape4
    implicit none

    integer, allocatable, dimension(:,:,:,:) :: f,c
    integer :: i,j,k,l

    f = reshape([(j,j=1,2*3*4*5)],[2,3,4,5])

    allocate(c(5,4,3,2))

    forall(i=1:2,j=1:3,k=1:4,l=1:5) c(l,k,j,i) = f(i,j,k,l)

    print *, 'Fortran=',f
    print *, 'C      =',c

end program test_reshape4

and look at the amazing way gfortran optimizes this stuff by creating 4-sized chunks of pointers

1 Like

This is a good example of using forall to “transpose” the array. This leaves any optimisation order up to the compiler, which we are told we should do.
I think this answers the OP question “Is there an intrinsic way to transpose N-dim arrays?”

However, my point is why bother.

If you are going to use array f or c, you have to use a subscript order that is compatible with the array declaration AND the order that data is stored in memory, independent of it having been created in a Fortran or C routine.
With cache management being so important for performance, we must also consider how to store the data in memory to minimise memory page usage. This applies equally to Fortran or C. This is the key answer to Why bother.

1 Like

Because one component of the system may emit the data in one order, and another component downstream expects the data in the other order.

Don’t get too hung up on C vs. Fortran order, I merely used that to illustrate what I’m trying to do, as people here are likely to understand the C/Fortran (row/column major) ordering concept. Different software frameworks or libraries emit or consume data assuming their own conventions. A software layer that connects them must follow these conventions for the system to work.

1 Like

Thanks, I like this as well, but it’s essentially the same as the do concurrent approach in the top post, plus an obsolescent feature. As performance is not critical for me, @awvwgk’s approach still remains my favorite because it doesn’t require hardcoding the shape; except for I’m not yet 100% sure it’s standard-conforming.

1 Like