Comments in ResearchGate about Fortran

Hi Ashok,

I have seen such performance tests before in the course “Programming with Fortran” offered at the Leibniz-Rechenzentrum. Specifically you can check the sections “Further performance aspects and use of Parameterized derived types” in the course slides.

The problem typically boils down to the issue of using an array of structures (AoS) or a structure of arrays (SoA). Imagine you are doing a particle simulation. Each particle can be represented as an instance of the derived type:

type :: body
  character(len =4) :: units
  real :: mass
  real :: pos(3), vel(3)
end type body

In your main code you will then allocate an array of bodies:

type(body), allocatable :: traj(:)
allocate(traj(ntraj))

Alternatively, you can fold the array properties into the derived type:

type :: body_p( k, ntraj)
  integer, kind :: k = kind(1.0)
  integer, len ::   ntraj = 1
  character(len=4) :: units
  real(kind=k) :: mass(ntraj)
  real(kind=k) :: pos(ntraj,3),   vel (ntraj,3)
end type body_p

In the main code you would use this structure of arrays as follows:

type( body_p(ntraj=:) ), allocatable ::  dyn_traj
allocate(body_p(ntraj=20) :: dyn_trag)

What is not immediately obvious is the two objects differ in their memory alignment as illustrated by the image below:

Depending what you are doing with the particles, the size of your array, compiler, etc. - the contiguous memory layout in the SoA format can potentially lead to improved vectorization, resulting in faster executables.

In some cases the poor performance of derived types, could be simply due to an immature compiler implementation, which doesn’t manage to exploit all the vectorization opportunities.

7 Likes