Read data and append it to array, best practice?

Greetings,

The task I want to perform is reading new datapoints and appending it to an existing array of old datapoints. I came up with a solution that uses two tmp variables and I was wondering if there was a cleaner way:

allocate( tmp_new( N ) )
*N new datapoints are stored into tmp1*

! old datapoints are copied into tmp_old
tmp_old = array

! array is deallocated and reallocated in order to store both
! old and new datapoints
deallocate(array)
allocate( array( size(tmp_old) + size(tmp_new) ))
array(1:size(tmp_old)) = tmp_old
array( size(tmp_old)+1:size(tmp_old)+size(tmp_new) ) = tmp_new

deallocate(tmp_old)
deallocate(tmp_new)
1 Like

Cleaner but (probably) not the fastest

! ...
real, allocatable :: array(:), n(:)
! ...
allocate( array(0) )
! get into `n` the new datapoints
array = [array, n]
! ...
6 Likes

You can simplify your code by using the allocation upon assignment feature, also known as auto-allocate, to just write

array = [array, tmp_new]

3 Likes

Thank you for your answers, I did not know this existed. I used what you guys showed me succesfully for 1D data.

  • Is it really allocation upon assignment feature that is at play ? This works even if array is already allocated.

  • How would one go about doing this for 2D data (say a 2 column array) ?

I tried

allocate(array(0, 2))
array = transpose([transpose(array), transpose(tmp_new)])

and

allocate(array(0, 2))
array(:, 1) = [array(:, 1), tmp_new(:, 1)]
array(:, 2) = [array(:, 2), tmp_new(:, 2)]

but these did not work.

Thank you again,
hohoho

Yes, I wish we could do this.

IIRC, Fortran is column-major. How about (untested):

allocate( array(2,0) )
associate( col1 => array(1,:), col2 => array(2,:), &
            col1_new => tmp_new(1,:),  col2_new => tmp_new(2,:) )
  col1 = [col1, col1_new]
  col2 = [col2, col2_new]
end associate

Or you can try this approach

using reshape instead of transpose, but IMO (even if it works) it would be far less readable.

2 Likes

You can write a function to combine matrices by row, as shown in the program:

module m
implicit none
contains
function rbind(a,b) result(ab)
! combine matrices a and b by row
integer, intent(in)  :: a(:,:),b(:,:)
integer, allocatable :: ab(:,:)
integer              :: n1a,n1,n2
n2 = size(a,2)
if (size(b,2) /= n2) stop "need size(a,2) == size(b,2)"
n1a = size(a,1)
n1 = n1a + size(b,1)
allocate (ab(n1,n2))
ab(:n1a,:)   = a
ab(n1a+1:,:) = b
end function rbind
end module m

program main
use m, only: rbind
implicit none
integer, allocatable :: i1(:,:),i2(:,:)
integer, parameter   :: n1_1 = 2, n1_2 = 3, n2 = 2
i1 = reshape([10,11,20,21],[n1_1,n2])
write (*,*) "i1"
write (*,"(2i5)") transpose(i1)
i2 = reshape([12,13,14,22,23,24],[n1_2,n2])
i1 = rbind(i1,i2)
write (*,*) "i2"
write (*,"(2i5)") transpose(i2)
write (*,*) "combined"
write (*,"(2i5)") transpose(i1)
end program main

The function rbind is named after the one in R. The program gives output

 i1
   10   20
   11   21
 i2
   12   22
   13   23
   14   24
 combined
   10   20
   11   21
   12   22
   13   23
   14   24
2 Likes

Fortran 2003 also introduced the intrinsic move_alloc() routine that can be used to enlarge arrays.

6 Likes

To generalize the OP’s question, it would be nice if Fortran had a general capability to combine two arrays along a specified axis, similar to numpy.concatenate . I requested this for stdlib. Is this something that would be part of a Fortran library for the Mathematics of Arrays?

1 Like

Indeed, Mathematics of Arrays defines the catenate operation to combine two arrays into a single object, provided that the array dimensions other than the primary axis match. We’re in the process of adding it into the library. The implementation does not copy the two arrays into a new larger array. Instead, it uses pointers to the original arrays to achieve the same functionality.

1 Like

@hohoho , please see this thread from a while ago over at the Intel Fortran forum, it may catch your interest. Note the “canonical” sequence implied by the current state of the language:

! say 'array' is currently allocated with size N 
allocate( tmp(M) )                   ! Generate a temporary of size M; some such as C++ std::vector tend to double the size each increasing allocation 
tmp(1:N) = array                     ! copy existing values from the array; it may be instead be in a DO loop
tmp(..) = ..                         ! add new elements 
call move_alloc( from=tmp, to=array) ! transfer allocation back to original array
3 Likes

Catenation with the creation of a new array would be much simpler, but that is not what MoA is all about.

The simplest way would be:

(fill in the type), allocatable, dimension(:) :: array
array = [ array1, array2 ]

and trust automatic reallocation to do the job of creating an array that is large enough and contains the joined result.

4 Likes

I had not tried MOVE_ALLOC() before, but it seems interesting.
Not quite sure how best to use it yet, but it worked with three compilers.
Looks like maybe if I used the original values as a SOURCE= on the allocate it might be better; still playing but I think MOVE_ALLOC() looks appealing …

 grid1 after
 > [1.000000000,4.000000000,10.00000000,40.00000000,70.00000000 ]
 > [2.000000000,5.000000000,20.00000000,50.00000000,80.00000000 ]
 > [3.000000000,6.000000000,30.00000000,60.00000000,90.00000000 ]
 T F           3          10
 grid1 after
 > [1.000000000,4.000000000,10.00000000,40.00000000,70.00000000,-1.000000000,-4.000000000,-10.00000000,-40.00000000,-70.00000000 ]
 > [2.000000000,5.000000000,20.00000000,50.00000000,80.00000000,-2.000000000,-5.000000000,-20.00000000,-50.00000000,-80.00000000 ]
 > [3.000000000,6.000000000,30.00000000,60.00000000,90.00000000,-3.000000000,-6.000000000,-30.00000000,-60.00000000,-90.00000000 ]
program demo_move_alloc ! Example to allocate a bigger GRID
implicit none
real, allocatable :: grid1(:,:)
integer :: i
   ! initialize small grid
   allocate(grid1(3,2))
   ! fill grids with some values
   grid1=reshape([ (real (i), i=1,size(grid1) ) ],shape(grid1))
   call showgrid('grid1 before',grid1)
   ! add a 3x3 array to the right of grid1
   call mergegrid(grid1,reshape([10.0,20.0,30.0,40.0,50.0,60.0,70.0,80.0,90.0],[3,3]))
   call showgrid('grid1 after',grid1)
   ! merge it with a negative copy of itself
   call mergegrid(grid1,-grid1)
   call showgrid('grid1 after',grid1)
contains

subroutine mergegrid(grida,gridb)
! assuming same number of rows in both
real, allocatable :: grida(:,:), tempgrid(:,:)
real              :: gridb(:,:)
integer :: c1, c2, r1, r2
   ! initialize TEMPGRID which will be used to replace GRIDA
   associate( r1=>size(grida,dim=1), r2=>size(gridb,dim=1), &
            & c1=>size(grida,dim=2), c2=>size(gridb,dim=2) )
   allocate (tempgrid(r1,c1+c2))    ! Allocate bigger grid
   tempgrid(:,1:c1)=grida
   tempgrid(:,c1+1:c1+c2)=gridb
   end associate
   ! move TEMPGRID to GRIDA
   call MOVE_ALLOC (from=tempgrid, to=grida)
   ! TEMPGRID should no longer be allocated
   ! and GRID should be the size TEMPGRID was
   print *, allocated(grida), allocated(tempgrid),shape(grida)
end subroutine mergegrid

subroutine showgrid(title,arr)
implicit none
character(len=*),parameter::ident= "@(#)showgrid(3f) - print small 2d integer arrays in row-column format"
character(len=*),intent(in)  :: title
real,intent(in)              :: arr(:,:)
integer                      :: i
   write(*,*)trim(title)                                                 ! print title
   do i=1,size(arr,dim=1)                                                ! print one row of array at a time
      write(*,fmt='(" > [",*(g0.10:,","))',advance='no')arr(i,:)
      write(*,'(" ]")')
   enddo
end subroutine showgrid

end program demo_move_alloc

PS:
It does work to put the data in as a source and shortens the code a little, but probably creates another temporary so not so sure about a change such as

! initialize TEMPGRID which will be used to replace GRIDA
   associate( r1=>size(grida,dim=1), r2=>size(gridb,dim=1), &
            & c1=>size(grida,dim=2), c2=>size(gridb,dim=2) )
   allocate (tempgrid(r1,c1+c2),source=reshape([grida,gridb],[r1,c1+c2]))    ! Allocate bigger grid
   end associate
1 Like

Yep, agreed. In fact,

I try to get not too obsessed with performance unless it’s really needed.

1 Like

With arrays in the gigabytes making little additions can get really expensive, and big additions can require a huge memory bump for the temporary arrays too. I was thinking part of the reason to have MOVE_ALLOC would be efficiency, as this would work too

subroutine mergegrid(grida,gridb) ! assuming same number of rows in both
real, allocatable :: grida(:,:)
real              :: gridb(:,:)
integer           :: c1, c2, r1, r2
   associate( r1=>size(grida,dim=1), r2=>size(gridb,dim=1), &
            & c1=>size(grida,dim=2), c2=>size(gridb,dim=2) )
   grida=reshape([grida,gridb],[r1,c1+c2])    ! Allocate bigger grid
   end associate
end subroutine mergegrid

I know of several programs where the user is supposed to guess an appropriate array size for his problem but if he guesses wrong the program will automatically keep doubling the array size every time it gets filled, and if the array is in the GB, making the new one and still having the old one gets pretty expensive.

I know of others where it just adds a row at a time, and that can really thrash things; although sometimes a smart compiler can make that a relatively cheap operation; but if it does so by creeating a non-contigious array that can have other draw-backs.

So since I am “discovering” MOVE_ALLOC what would be the purpose for it when I could just replace it with something like the above, I wonder?

3 Likes

Reading large amounts of (coordinate) data from disk into an array was one reason I wrote a list structure. Of course at some point you have a memory footprint of 2n but it was substantially faster than calling move_alloc many times. Syntactically it was simple because my list structure has a to_array method (which in turn uses the forward iterator).

I think you might be relying too much on the compiler’s ability to optimize that statement. Any discarded temporaries created would be a waste of cycles.
Also, some times, you might want to refine a grid instead of extending it (or some other complication) and then, I think, you would be hard pressed to write an array constructor expression. MOVE_ALLOC decouples the allocating from the constructing and allows you to write a number of statements until you decide to move the allocation from name to name.

1 Like