Read data and append it to array, best practice?

hohoho · September 22, 2021, 2:39pm

Greetings,

The task I want to perform is reading new datapoints and appending it to an existing array of old datapoints. I came up with a solution that uses two tmp variables and I was wondering if there was a cleaner way:

allocate( tmp_new( N ) )
*N new datapoints are stored into tmp1*

! old datapoints are copied into tmp_old
tmp_old = array

! array is deallocated and reallocated in order to store both
! old and new datapoints
deallocate(array)
allocate( array( size(tmp_old) + size(tmp_new) ))
array(1:size(tmp_old)) = tmp_old
array( size(tmp_old)+1:size(tmp_old)+size(tmp_new) ) = tmp_new

deallocate(tmp_old)
deallocate(tmp_new)

epagone · September 22, 2021, 2:45pm

Cleaner but (probably) not the fastest

! ...
real, allocatable :: array(:), n(:)
! ...
allocate( array(0) )
! get into `n` the new datapoints
array = [array, n]
! ...

Beliavsky · September 22, 2021, 2:58pm

You can simplify your code by using the allocation upon assignment feature, also known as auto-allocate, to just write

array = [array, tmp_new]

hohoho · September 22, 2021, 3:52pm

Thank you for your answers, I did not know this existed. I used what you guys showed me succesfully for 1D data.

Is it really allocation upon assignment feature that is at play ? This works even if array is already allocated.
How would one go about doing this for 2D data (say a 2 column array) ?

I tried

allocate(array(0, 2))
array = transpose([transpose(array), transpose(tmp_new)])

and

allocate(array(0, 2))
array(:, 1) = [array(:, 1), tmp_new(:, 1)]
array(:, 2) = [array(:, 2), tmp_new(:, 2)]

but these did not work.

Thank you again,
hohoho

epagone · September 22, 2021, 4:14pm

Yes, I wish we could do this.

IIRC, Fortran is column-major. How about (untested):

allocate( array(2,0) )
associate( col1 => array(1,:), col2 => array(2,:), &
            col1_new => tmp_new(1,:),  col2_new => tmp_new(2,:) )
  col1 = [col1, col1_new]
  col2 = [col2, col2_new]
end associate

Or you can try this approach

using reshape instead of transpose, but IMO (even if it works) it would be far less readable.

Beliavsky · September 22, 2021, 4:54pm

You can write a function to combine matrices by row, as shown in the program:

module m
implicit none
contains
function rbind(a,b) result(ab)
! combine matrices a and b by row
integer, intent(in)  :: a(:,:),b(:,:)
integer, allocatable :: ab(:,:)
integer              :: n1a,n1,n2
n2 = size(a,2)
if (size(b,2) /= n2) stop "need size(a,2) == size(b,2)"
n1a = size(a,1)
n1 = n1a + size(b,1)
allocate (ab(n1,n2))
ab(:n1a,:)   = a
ab(n1a+1:,:) = b
end function rbind
end module m

program main
use m, only: rbind
implicit none
integer, allocatable :: i1(:,:),i2(:,:)
integer, parameter   :: n1_1 = 2, n1_2 = 3, n2 = 2
i1 = reshape([10,11,20,21],[n1_1,n2])
write (*,*) "i1"
write (*,"(2i5)") transpose(i1)
i2 = reshape([12,13,14,22,23,24],[n1_2,n2])
i1 = rbind(i1,i2)
write (*,*) "i2"
write (*,"(2i5)") transpose(i2)
write (*,*) "combined"
write (*,"(2i5)") transpose(i1)
end program main

The function rbind is named after the one in R. The program gives output

interkosmos · September 22, 2021, 5:39pm

Fortran 2003 also introduced the intrinsic move_alloc() routine that can be used to enlarge arrays.

Beliavsky · September 22, 2021, 8:01pm

To generalize the OP’s question, it would be nice if Fortran had a general capability to combine two arrays along a specified axis, similar to numpy.concatenate . I requested this for stdlib. Is this something that would be part of a Fortran library for the Mathematics of Arrays?

wyphan · September 23, 2021, 12:06am

Indeed, Mathematics of Arrays defines the catenate operation to combine two arrays into a single object, provided that the array dimensions other than the primary axis match. We’re in the process of adding it into the library. The implementation does not copy the two arrays into a new larger array. Instead, it uses pointers to the original arrays to achieve the same functionality.

FortranFan · September 23, 2021, 2:11am

@hohoho , please see this thread from a while ago over at the Intel Fortran forum, it may catch your interest. Note the “canonical” sequence implied by the current state of the language:

! say 'array' is currently allocated with size N 
allocate( tmp(M) )                   ! Generate a temporary of size M; some such as C++ std::vector tend to double the size each increasing allocation 
tmp(1:N) = array                     ! copy existing values from the array; it may be instead be in a DO loop
tmp(..) = ..                         ! add new elements 
call move_alloc( from=tmp, to=array) ! transfer allocation back to original array

Arjen · September 23, 2021, 6:07am

Catenation with the creation of a new array would be much simpler, but that is not what MoA is all about.

The simplest way would be:

(fill in the type), allocatable, dimension(:) :: array
array = [ array1, array2 ]

and trust automatic reallocation to do the job of creating an array that is large enough and contains the joined result.

urbanjost · September 23, 2021, 7:11am

I had not tried MOVE_ALLOC() before, but it seems interesting.
Not quite sure how best to use it yet, but it worked with three compilers.
Looks like maybe if I used the original values as a SOURCE= on the allocate it might be better; still playing but I think MOVE_ALLOC() looks appealing …

 grid1 after
 > [1.000000000,4.000000000,10.00000000,40.00000000,70.00000000 ]
 > [2.000000000,5.000000000,20.00000000,50.00000000,80.00000000 ]
 > [3.000000000,6.000000000,30.00000000,60.00000000,90.00000000 ]
 T F           3          10
 grid1 after
 > [1.000000000,4.000000000,10.00000000,40.00000000,70.00000000,-1.000000000,-4.000000000,-10.00000000,-40.00000000,-70.00000000 ]
 > [2.000000000,5.000000000,20.00000000,50.00000000,80.00000000,-2.000000000,-5.000000000,-20.00000000,-50.00000000,-80.00000000 ]
 > [3.000000000,6.000000000,30.00000000,60.00000000,90.00000000,-3.000000000,-6.000000000,-30.00000000,-60.00000000,-90.00000000 ]

program demo_move_alloc ! Example to allocate a bigger GRID
implicit none
real, allocatable :: grid1(:,:)
integer :: i
   ! initialize small grid
   allocate(grid1(3,2))
   ! fill grids with some values
   grid1=reshape([ (real (i), i=1,size(grid1) ) ],shape(grid1))
   call showgrid('grid1 before',grid1)
   ! add a 3x3 array to the right of grid1
   call mergegrid(grid1,reshape([10.0,20.0,30.0,40.0,50.0,60.0,70.0,80.0,90.0],[3,3]))
   call showgrid('grid1 after',grid1)
   ! merge it with a negative copy of itself
   call mergegrid(grid1,-grid1)
   call showgrid('grid1 after',grid1)
contains

subroutine mergegrid(grida,gridb)
! assuming same number of rows in both
real, allocatable :: grida(:,:), tempgrid(:,:)
real              :: gridb(:,:)
integer :: c1, c2, r1, r2
   ! initialize TEMPGRID which will be used to replace GRIDA
   associate( r1=>size(grida,dim=1), r2=>size(gridb,dim=1), &
            & c1=>size(grida,dim=2), c2=>size(gridb,dim=2) )
   allocate (tempgrid(r1,c1+c2))    ! Allocate bigger grid
   tempgrid(:,1:c1)=grida
   tempgrid(:,c1+1:c1+c2)=gridb
   end associate
   ! move TEMPGRID to GRIDA
   call MOVE_ALLOC (from=tempgrid, to=grida)
   ! TEMPGRID should no longer be allocated
   ! and GRID should be the size TEMPGRID was
   print *, allocated(grida), allocated(tempgrid),shape(grida)
end subroutine mergegrid

subroutine showgrid(title,arr)
implicit none
character(len=*),parameter::ident= "@(#)showgrid(3f) - print small 2d integer arrays in row-column format"
character(len=*),intent(in)  :: title
real,intent(in)              :: arr(:,:)
integer                      :: i
   write(*,*)trim(title)                                                 ! print title
   do i=1,size(arr,dim=1)                                                ! print one row of array at a time
      write(*,fmt='(" > [",*(g0.10:,","))',advance='no')arr(i,:)
      write(*,'(" ]")')
   enddo
end subroutine showgrid

end program demo_move_alloc

PS:
It does work to put the data in as a source and shortens the code a little, but probably creates another temporary so not so sure about a change such as

! initialize TEMPGRID which will be used to replace GRIDA
   associate( r1=>size(grida,dim=1), r2=>size(gridb,dim=1), &
            & c1=>size(grida,dim=2), c2=>size(gridb,dim=2) )
   allocate (tempgrid(r1,c1+c2),source=reshape([grida,gridb],[r1,c1+c2]))    ! Allocate bigger grid
   end associate

epagone · September 23, 2021, 10:21am

urbanjost:

PS:
It does work to put the data in as a source and shortens the code a little, but probably creates another temporary so not so sure about a change such as
! initialize TEMPGRID which will be used to replace GRIDA
   associate( r1=>size(grida,dim=1), r2=>size(gridb,dim=1), &
            & c1=>size(grida,dim=2), c2=>size(gridb,dim=2) )
   allocate (tempgrid(r1,c1+c2),source=reshape([grida,gridb],[r1,c1+c2]))    ! Allocate bigger grid
   end associate

Yep, agreed. In fact,

I try to get not too obsessed with performance unless it’s really needed.

urbanjost · September 23, 2021, 12:23pm

With arrays in the gigabytes making little additions can get really expensive, and big additions can require a huge memory bump for the temporary arrays too. I was thinking part of the reason to have MOVE_ALLOC would be efficiency, as this would work too

subroutine mergegrid(grida,gridb) ! assuming same number of rows in both
real, allocatable :: grida(:,:)
real              :: gridb(:,:)
integer           :: c1, c2, r1, r2
   associate( r1=>size(grida,dim=1), r2=>size(gridb,dim=1), &
            & c1=>size(grida,dim=2), c2=>size(gridb,dim=2) )
   grida=reshape([grida,gridb],[r1,c1+c2])    ! Allocate bigger grid
   end associate
end subroutine mergegrid

I know of several programs where the user is supposed to guess an appropriate array size for his problem but if he guesses wrong the program will automatically keep doubling the array size every time it gets filled, and if the array is in the GB, making the new one and still having the old one gets pretty expensive.

I know of others where it just adds a row at a time, and that can really thrash things; although sometimes a smart compiler can make that a relatively cheap operation; but if it does so by creeating a non-contigious array that can have other draw-backs.

So since I am “discovering” MOVE_ALLOC what would be the purpose for it when I could just replace it with something like the above, I wonder?

simong · September 23, 2021, 4:46pm

Reading large amounts of (coordinate) data from disk into an array was one reason I wrote a list structure. Of course at some point you have a memory footprint of 2n but it was substantially faster than calling move_alloc many times. Syntactically it was simple because my list structure has a to_array method (which in turn uses the forward iterator).

themos · September 23, 2021, 5:42pm

I think you might be relying too much on the compiler’s ability to optimize that statement. Any discarded temporaries created would be a waste of cycles.
Also, some times, you might want to refine a grid instead of extending it (or some other complication) and then, I think, you would be hard pressed to write an array constructor expression. MOVE_ALLOC decouples the allocating from the constructing and allows you to write a number of statements until you decide to move the allocation from name to name.

Dobrodzieju · November 22, 2023, 8:29pm

Greetings
Sorry to return to an old topic but I’m impressed that autoallocate works also with string arrays. Consider the following

type string
    character(:), allocatable :: s
end type string
type string_array
    type(string), allocatable :: sa(:)
end type string_array
!
type(string) :: sfoo
type(string_array) :: stra
!
allocate(stra%sa(2))
stra%sa(1)%s = 'asd'
stra%sa(2)%s = 'fghi'
sfoo%s ='qwerty'
stra%sa = [stra%sa,sfoo]
print *, stra%sa(1)%s//stra%sa(2)%s//stra%sa(3)%s
!

Amazing! Almost like Python

RonShepard · November 23, 2023, 12:43am

This has been a frequent topic of discussion. There are two important “missing” features. First, the string type should be an intrinsic rather than user defined. If it is user defined, then multiple users will define their own types, perhaps with slightly different spellings, and then it is a tangled web of confusion when trying to use code with all these different conventions. If it were an intrinsic type, then everyone’s code would work together naturally. Second, it is not allowed to initialize variables, including arrays, of this type, or to define parameter constants of this type. Both of those things would be useful in the language to programmers.

jeff.irwin · November 23, 2023, 1:01am

@Dobrodzieju FYI, you can save a small amount of typing in that last line:

print *, (stra%sa(i)%s, i = 1, 3)

RonShepard · November 23, 2023, 2:48am

But for some reason

print *, stra%sa(:)%s

is not allowed. I do not know why not. The syntax is unambiguous, the intent is clear, and the code is concise and compact.

Topic		Replies	Views
MOVE_ALLOC to grow an array	21	3202	September 6, 2022
Implied-do array constructor, type specs, and differences between GFortran, Intel, and LFortran Help	49	1298	May 26, 2024
Arrays and lists Help	46	2380	October 5, 2023
Enhancements to allocatable arrays Language enhancement	24	1738	December 19, 2023
Do we need REALLOCATE?	22	4400	April 25, 2021

Read data and append it to array, best practice?

Related topics