Pointers to describe (staggered grid) topologies

Disclaimer: In the following example I describe simple cartesian differential operators, so all this construction may seem overkill. However, when using curvilinear coordinates on non-planar geometries, like this tripolar grid on the sphere, things become less trivial.

I am trying to come up with a set of routines to ease a little bit the use of finite difference grids. The target application is that of building an Arakawa C grid to use in a simple ocean model demonstrator.

My idea so far is that of having a base module of differential operators with plain routines, where by plain I mean that follow the established good practice, to have them as efficient as possible. So no derived data types, everything passing through the routine interface with clear intent and so on. Ideally there would be a module/submodule separation to work on the single routine more efficiently, but the important part is that these routines are in a separate module, independent of the rest, that can follow its own development/optimisation path.

Now, if you consider this picture, you can see that the points u,v and w are located at different locations on the cube. This means that, when computing a gradient of a velocity component (say \partial_{x}v), you end up at a point different from the original one (in this case, the central point, or T-point). In particular, you can create a “topology” such that given a certain point and a certain operator, you will know the point where you will end up.

So I tried putting this idea into code, and I have came up with the following prototype (for a 2D case)

module grid2d
  use stdlib_kinds
  type :: grid
    ! Metric terms for computations (basically dx and dy)
    real(dp), pointer, dimension(:,:) :: e1 => null()
    real(dp), pointer, dimension(:,:) :: e2 => null()
    ! Neighbouring grids
    type(grid), pointer :: upper => null()
    type(grid), pointer :: lower => null()
    type(grid), pointer :: right => null()
    type(grid), pointer :: left => null()
    !
  contains
    procedure :: div2  ! procedure declaration
  end type
  !
contains
  !
  subroutine div2(grid_in,u,v,diver)
    use differentials, only: divergence
    class(grid), intent(in   )                 :: grid_in
    real(dp),    intent(in   ), dimension(:,:) :: u,v
    real(dp),    intent(  out), dimension(:,:) :: diver

    if (.not.associated(grid_in%right)) error stop "grid_in%right not associated"
    if (.not.associated(grid_in%upper)) error stop "grid_in%upper not associated"

    if (.not.associated(grid_in%right%e1)) error stop "grid_in%right%e1 not associated"
    if (.not.associated(grid_in%upper%e2)) error stop "grid_in%upper%e2 not associated"

    call divergence(u,v,grid_in%right%e1,grid_in%upper%e2,diver)
    
  end subroutine
  !
end module grid2d

You can ignore e1 and e2, those are metric terms (i.e. dx and dy).
There are two things here: the first one is that I have a derived data type that contains both the metrics of the grid and the topology, the second is that this done with pointers (a topic that I do not master at all).

What I was hoping to achieve was a lightweight DDT containing only topological and positional information, i.e. how different DDT are related and where the corresponding data is in memory, that is what I need to call a specific divergence routine. I have opted for pointers to avoid having heavy data structures containing a lot of data, which from my understanding is a pitfall of the DDT approach.

Now, my question is simple: I want to assess my losses in going this way rather than the explicit call with all the data needed. How can I do that? For now, I’ve done some tests in this way

module dims
  integer, parameter :: nx = 1000
  integer, parameter :: ny = 1000
end module dims

program main
  use dims
  use grid2d
  use differentials

  ! Here I define metric terms
  real(dp), target, dimension(nx,ny) :: dx
  real(dp), target, dimension(nx,ny) :: dy
  ! Here I define the velocity field
  real(dp), dimension(nx,ny) :: u
  real(dp), dimension(nx,ny) :: v
  ! Here I define the divergence
  real(dp), dimension(nx,ny) :: divU

  real(dp) :: tic, toc, meantime
  real(dp) :: rand
  integer :: i,j,t
  integer :: Ntimes

  ! Here I define the topology
  type(ArakawaC) :: topo
  type(grid), target :: Ugrid
  type(grid), target :: Vgrid
  type(grid), target :: Tgrid
  character(len=:), allocatable :: type_operation

  meantime=0._dp

  dx = 1._dp
  dy = 1._dp
  ! Association of the pointer
  topo%Ugrid%e1 => dx
  topo%Vgrid%e2 => dy
  Ugrid%e1 => dx
  Vgrid%e2 => dy
  Tgrid%upper => Vgrid
  Tgrid%right => Ugrid
  !
  Ntimes = 100
  ! Choice of the way to perform the operation
  !type_operation="classical"  
  type_operation="derived"

do t=1,Ntimes
  call random_number(rand)
  do i = 1, nx
    v(i,:) = [(rand*t*j, j=ny)]
  end do
  do j = 1, ny
    u(:,j) = [(rand*t*i, i=1,nx)]
  end do  
  select case ( type_operation )
    case("classical")
      call cpu_time(tic)
      call divergence(u,v,dx,dy,divU)
      call cpu_time(toc)
      !print *, "Passing array:", toc-tic
      meantime = meantime + toc-tic
    case("derived")
      call cpu_time(tic)
      call divergence(u,v,Ugrid%e1,Vgrid%e2,divU)
      !!call Tgrid%div2(u,v,divU)
      call cpu_time(toc)
      !print *, "Derived:", toc-tic
      meantime = meantime + toc-tic
  end select
end do

  select case ( type_operation )
    case("classical")
      print *, "Passing array average time:", meantime/Ntimes
    case("staggered")
      print *, "Derived average time:", meantime/Ntimes
  end select

  stop
end program main

What I see is that I don’t have too much of a penalisation in using the DDT-pointer method, which is the desired outcome. However, I have tested it only on godbolt with Gfortran.
Which leads me to the second question: what compilation flags should I use to optimise this as much as possible?
Last but not least, can someone give me some references to understand why pointer are so despised and how to possibly mitigate their pitfalls?

You can see the whole code here: Compiler Explorer

Every note and comment is welcome, as usual.

There are a number of reasons why pointers are despised, yet sometimes necessary.

One reason is the potential for aliasing. That is, multiple pointers can point to the same data (targets, in Fortran) in memory. So compiler optimization must be conservative and assume they will be pointing to the same locations - even if they don’t. This can defeat a number common optimizations.

In your case aliasing is avoided because you are passing the targets through separate arguments to procedure divergence. By the Fortran Standard, when compiling divergence, the compiler is allowed to assume the arguments are not aliased. So the fact you are not seeing a performance difference is not surprising.

Note that a problem that sometimes shows up in code is when two actual arguments to a procedure are aliased. It is actually illegal Fortran, but often hard to diagnose. Use of intents on the procedure side can help a lot. (E.g., If both arguments are intent(in), then no problem. Otherwise, potential problem.)

This is actually at the crux of why Fortran code generally outperforms C code. C forces the use of pointers at every turn.

2 Likes

You might want to take a look at the following dissertation and a modified version of the code referenced in it.

Stefano Toninel, “Development of a New Parallel Code for Computational Continuum Mechanics Using Object-Oriented Techniques”, University of Bologna, April, 2006.

Toninel outlines an object-based approach to defining grid topologies and PDE operators that I used as inspiration for a code I started writing several years ago. I didn’t find using composition etc. to define a global grid class that contained a combination of REAL arrays and DDTs for the connectivity data (ie the “heavy data structures” you reference) to be a big issue in terms of performance.

A modified version of Toninel’s NEMO code called MORFEUS is available at the following link. A PDF of Toninel’s dissertation is available in the distribution’s developer-doc/media sub-directory.

Morfeus link is:

1 Like

Could you provide a minimal working example for this behaviour? I’m struggling to visualise it

For those having a hard time understanding this topic (like me), this video was really easy to follow and to understand

So, what I get from this comment is that if I use pointers only to read data instead of also writing over some pointer, the compiler will be able to optimise the code because there is no possibility of aliasing.

1 Like

In your example you’re working on 1.0M elements - you’re not going to have absolutely any performance bottlenecks by using pointers to store/reshape mesh data (it’s actually an excellent way to handle that).

Just ensure that:

  • the workload is performed in procedures that know nothing about the pointer status → do not use pointer or target for the arguments, just use them
subroutine heavy_load(a,b,c)
   real, intent(inout) :: a(:),b(:),c(:) ! do not set pointer/target

...

call heavy_load (Ugrid%e1, ...) ! arguments can be pointers

Fortran will just assume there can never be aliasing among the arguments of a same subroutine (Fortran’s well-known “no-alising” rule is perhaps its biggest difference w.r.t. C).
Of course, you must be sure that Ugrid%e1 and your other pointers do not overlap, to prevent unwanted behaviour.

  • use pure, elemental procedures wherever possible: it means the compiler can work with pure, non-aliased addresses hence it can turn on all possible vectorizations and optimizations.
2 Likes

No. There can be unlimited aliasing. If you only read, no matter how much that memory space is aliased, the compiler is safe to do any optimization it wants since it knows that no one is ever modifying that memory and so it does not need to add the extra code to check if it was in fact modified before the next time anyone is trying to read back again from it (i.e. from any aliasing variable).

Is the programmer guaranteed to have these same nonalias conditions if he uses BLOCK or ASSOCIATE to do the floating point operations rather than within a procedure? If so, do current fortran compilers take advantage of that in order to produce optimal nonalias code within blocks?

I don’t know about BLOCKs but what you are suggesting should definitely apply to ASSOCIATE statements, where the association/allocation status of pointer/allocatable variables must be well defined. However, I’m not sure if I would rely on that in production… pure and where applicable elemental procedures definitely make a lot more sense to me.

I can’t see how a BLOCK construct can “dealias” an object…

Regarding the ASSOCIATE construct, in 11.1.3.3 one can read:

Within an ASSOCIATE […] construct, […]. The associating entity does not have the ALLOCATABLE or POINTER attributes; it has the TARGET attribute if and only if the selector is a variable and has either the TARGET or POINTER attribute.

In other words:

real, pointer :: x(:)
...
ASSOCIATE( y => x )
   ! y has implicitely the TARGET attribute here, and is therefore
   ! considered as potentially aliased by the compiler
END ASSOCIATE
1 Like