I have a data structure like this:
type data_t
real, allocatable :: a2d(:,:)
end type
I am typically allocating an array of size 10000 or more of this type
type(data_t), allocatable :: d(:)
allocate( d(10000) )
And the shape of each component a2d(:,:)
is typically 400x5000 (varying from element to element of d, otherwise I would have used a global 3D array). It can end up with 100 GB or more, and I may need several instances of it in a conjugate gradient scheme.
What I am storing in each column of a(:,:)
is however not always the same size, so I though about defining a jagged array type in order to allocate only what is needed:
type( jagged_t)
real, allocatable :: a1d(:)
end type
type data_t
type( jagged_t), allocatable :: aj(:)
end type
At the end, I can allocate each a1d(:)
array to the exact size I need, without wasting space…
Really?
Well, not exactly. With the ifort compiler, I can see that the storage size of a type(jagged_t)
is 576 bytes. That is equivalent of 18 real
elements. So, for each column I have a memory overhead of 18 samples. This is not extremely bad, as with this method I can save say 100 samples per column on average, so this is still interesting, and I am just paying 20% of what I am saving as a tax to the compiler.
Nonetheless I wanted to mitigate these 20% by defining 2D subgroups instead of individual 1D arrays:
integer, parameter :: NCOL = 10
type( subgroup_t )
real, allocatable :: asub2d(:,:) ! always allocated with NCOL=10 columns
end type
type data_t
type( subgroup_t ), allocatable :: as(:)
end type
If originally I had a a2d(*,5000)
array, now I have a as(500)
array, each element containing a asub2d(*,10)
array. The number of rows of each asub2d
are still potentially smaller than the number of rows of a2d
(the number of rows being the max of what is needed for each column), and the 576 bits overhead is now for 10 columns instead of a single column.
So, are good, now?
We are not… to my surprise, when I monitored the memory usage of the process while running, it was 15-20% above what I expected! And I found the explanation: the as(i)%asub2d(:,:)
components are not allocated very contiguously in the memory, there are significant gaps in between. By checking a few addresses, I could see that the gaps were about 1500 bytes on average, for an average size of and asg(:,:)
array of 3000 elements, hence 12 000 bytes, which is about 12% wasted memory. I don’t get why it happens…
All of this is disappointing… It would be easier with C pointers, which have only 64 bits overheads, but still there could be some gaps between all the individual columns. So, I am thinking now about using good old flat 1D arrays, with indexes to access the original columns.
type data_t
real, allocatable :: a1d(:)
integer, allocatable :: idx(:) ! idx(j)=index in 1d array of the original j-th column
end type