Arrays and lists

If one can accept a final version of a that is not completely used (i.e. some memory waste), the last line is optional, and one has just to keep track of the used size (the last value of i). This is how C++ vectors work anyway, although the process is more transparent to the developer.

If the programmer uses the linked-list approach, then the final array is allocated once, of the correct size, and there is only a single copy from the user’s linked-list into that array.

If this functionality were part of the standard i/o conventions, then the copy from the linked-list structure within the i/o library to the user’s linked list would also be avoided.

An alternate option Fortranners can consider in such situations is a stack data structure that can also be modeled using the facility introduced starting Fortran 2008 viz. allocatable components of recursive type.

Coders can thus avoid components of POINTER attribute and have a safer container to work with.

Below is a simple example:

Click here to see code
module node_m

   type :: node_t
      integer :: i = 0  !<-- "data" held by the node
      type(node_t), allocatable :: next
   end type node_t

contains

   pure subroutine PushNode(node, i)

      type(node_t), allocatable, intent(inout) :: node
      integer, intent(in)                      :: i

      !.. Local variables
      type(node_t), allocatable :: new_node

      allocate(new_node) ; new_node%i = i

      ! Push nodes
      call move_alloc( from=node, to=new_node%next )
      call move_alloc( from=new_node, to=node )

      return

   end subroutine

   pure subroutine PopNode( node )

      type(node_t), allocatable, intent(inout) :: node

      type(node_t), allocatable :: new_node

      if ( .not. allocated(node) ) return

      if ( allocated(node%next) ) then
         call move_alloc( from=node%next, to=new_node )
         deallocate( node )
         call move_alloc( from=new_node, to=node )
      else
         deallocate( node )
      end if

      return

   end subroutine PopNode

   pure subroutine GetNodeData( node, dat, idx )
      
      ! Argument list
      type(node_t), allocatable, intent(in) :: node
      integer, intent(inout)                :: dat(:)
      integer, intent(inout)                :: idx

      ! elided are any checks
      if ( allocated(node) ) then

         idx = idx - 1
         dat(idx) = node%i
         call GetNodeData( node%next, dat, idx )

      end if

      return
       
   end subroutine

end module node_m

module stack_m

   use node_m, only : node_t, PushNode, PopNode, GetNodeData

   type :: stack_t
      type(node_t), allocatable :: nodes
      integer :: num_nodes = 0
   contains
      private
      procedure, pass(this) :: render_data
      procedure, pass(this) :: push_node
      procedure, pass(this) :: pop_node
      generic, public :: pop => pop_node
      generic, public :: push => push_node
      generic, public :: dat => render_data
   end type stack_t

contains

   pure elemental subroutine push_node( this, i )

      class(stack_t), intent(inout) :: this
      integer, intent(in)           :: i

      call PushNode( this%nodes, i )
      this%num_nodes = this%num_nodes + 1

      return

   end subroutine

   pure elemental subroutine pop_node( this )

      class(stack_t), intent(inout) :: this

      call PopNode( this%nodes )
      this%num_nodes = this%num_nodes - 1

      return

   end subroutine

   pure function render_data( this ) result(dat)

      ! Argument list
      class(stack_t), intent(in) :: this
      ! Function result
      integer, allocatable :: dat(:)

      ! Local variables
      integer :: idx

      ! elided are any checks e.g, if numbers of nodes is valid
      allocate( dat(this%num_nodes) )

      idx = this%num_nodes + 1
      call GetNodeData( this%nodes, dat, idx )
       
      return

   end function

end module stack_m

   use stack_m, only : stack_t

   type(stack_t) :: stack
   integer :: i, n

   print *, "Enter a positive integer:"
   read *, n
   do i = 1, n
      call stack%push( 42 + i - 1 )
   end do

   print *, "Data on stack, rendered as an array: ", stack%dat()

end
  • The program response can be as follows:
C:\temp>ifort /standard-semantics /free p.f
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.10.0 Build 20230609_000000
Copyright (C) 1985-2023 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.36.32537.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
 Enter a positive integer:
12
 Data on stack, rendered as an array:  42 43 44 45 46
 47 48 49 50 51 52
 53
1 Like

Agree. The linked list approach is optimum in terms of temporary memory occupation and minimum amount of data movements. It requires (slightly) more code, though.

I find that I often do not write these kinds of data-structure codes in the briefest way. I usually end up with extra initializations and things like that. Here is an example of using a temporary linked list to read an arbitrary number of integers and return them in an array. I wrote this entirely with allocatables, and with no pointers. There are also obvious ways to traverse the linked list with pointers, but I thought it might be good to do it this way just to show off one of the nice features of fortran.

program linked_read_1
   implicit none
   type list_member_type
      integer :: i  ! value
      type(list_member_type), allocatable :: prev
   end type list_member_type
   type(list_member_type), allocatable :: list, new
   integer :: i, n, istat
   integer, allocatable :: array(:)  ! final target array.
   allocate( list )  ! begin with an empty slot.
   n = 0  ! list length.
   do
      write(*,'(a)',advance='no')  'enter a positive integer: '
      i = -1
      read(*,*,iostat=istat) i
      if ( istat < 0 ) exit
      if ( i <= 0 ) exit
      n = n + 1  ! new value.
      list%i = i
      allocate ( new )
      call move_alloc( from=list, to=new%prev )
      call move_alloc( from=new, to=list )
   enddo
   allocate( array(n) )   ! single allocation of the final target array.
   do i = n, 1, -1        ! extract and deallocate one member at a time.
      call move_alloc( from=list, to=new )
      call move_alloc( from=new%prev, to=list )
      array(i) = list%i
   enddo
   deallocate( list )  ! all done with the temp linked list.
   write(*,'(*(i0,1x))') array
end program linked_read_1

As you say, this is only slightly more complicated than some of the other suggestions, which involve multiple array allocations and multiple copies of the data. I’m focusing here on the main do loop that reads the data when I say that. I suspect that the above code could be simplified. Does anyone have a simpler linked-list version?

I also have a version that works by allocating buffer arrays of values rather than individual values. This reduces the allocation overhead, which it often a significant part of the overall effort. Surprisingly, it is only slightly more complicated than the above code.

2 Likes

Thank you for all the fascinating comments.

The reason for the original question was that I had a binary file that held data initially and then, as the model progressed, more data was written to the file.

I wanted to reduce the writing to the file as much as possible (there are 3 or 4 files being written to throughout the program’s execution) and thought of storing the data at first within a list, then writing the data to the file.

However, the amount of data varied depending upon the execution of the file and, with so many variables involved, calculating the potential length of any such array would be impossible.

I loved the idea of the double array; that was a very neat solution. However, this array concept was only a short-term fix and one that I now do not consider having any value.

I now need to see if I can simply reduce the number of files into a single binary file solution, which would help considerably. I am also exploring how frequently the file is read from; since if the file is rarely read from then why write to it at all?!?

My next exploration is the execution time involved in reading/writing to a binary file compared to a text file.

Just, by changing a bit the order of operations, the initial allocation and the final deallocation can be avoided:

program linked_read_1
   implicit none
   type list_member_type
      integer :: i  ! value
      type(list_member_type), allocatable :: prev
   end type list_member_type
   type(list_member_type), allocatable :: list, new
   integer :: i, n, istat
   integer, allocatable :: array(:)  ! final target array.
   n = 0  ! list length.
   do
      write(*,'(a)',advance='no')  'enter a positive integer: '
      i = -1
      read(*,*,iostat=istat) i
      print*, i
      if ( i < 0 .or. istat < 0 ) exit
      n = n + 1  ! new value.
      new = list_member_type(i)
      call move_alloc( from=list, to=new%prev )
      call move_alloc( from=new, to=list )
   enddo
   allocate( array(n) )   ! single allocation of the final target array.
   do i = n, 1, -1        ! extract and deallocate one member at a time.
      array(i) = list%i
      call move_alloc( from=list%prev, to=new )
      call move_alloc( from=new, to=list )
   enddo
   write(*,'(*(i0,1x))') array
end program linked_read_1
2 Likes

In essence this example is a repeat of the stack data structure I provided earlier.

In most instances, the simplicity is not in the code itself, rather it is in a ibrary-oriented design for a container class and how the so-called APIs are setup so that it can be used easily and robustly in various different codes by users and how readily it can be adapted for different data content (which may be derived types aka “classes” themselves), especially given the lack of Generics in Fortran toward such aspects.

Yes, that does clean up the code. I think @FortranFan is correct that this can be regarded as a stack implemented with a linked list (as opposed, for example, to a stack implemented with a block of preallocated memory).

For those wanting to use this kind of linked list data structure with allocatable members, it is also possible to traverse the linked list with pointers without destroying it. You might want to do this if, for example, you stop reading input in order to perform some operation on the existing data (e.g. to insert some new members or to delete some of the existing members), and then continue later on appending into the same linked list data structure. If you do this, then a nice feature of fortran is that the statement

deallocate( list )

is all that is required to deallocate the whole shebang, with no memory leaks. When you insert/delete/deallocate a linked list constructed with pointer members, you must do the recursion manually and you must be much more careful to avoid memory leaks.

I wonder how it behaves in case of a very long list, say 10^7 elements? This will normally generate a recursion of 10^7 depth, which has chances to result in a stack overflow. In such cases it’s probably better to deallocate iteratively anyway.

That is a good question, and I do not know the answer. I expect these concerns are valid. I have done these deallocations both ways. The statement deallocate(list) will be implemented by deallocating the list starting from the head (first value) and then traversing to the tail (last value). That requires a call stack to maintain the addresses of all the nodes. When a programmer writes a loop to manually deallocate the nodes, it would be done in the other order, last-to-first, and no call stack is required. From the previous posted code, the manual list deallocation could be done as:

   do i = n, 1, -1        ! deallocate one member at a time.
      call move_alloc( from=list%prev, to=new )
      call move_alloc( from=new, to=list )
   enddo

This reuses the new variable rather than storing the whole address stack. There are no deallocate() statements required because they are implicit within the move_alloc() calls.

Just out of simple curiosity, I wanted to test how the nice linked-list/stack approach both @FortranFan and @RonShepard have highlited (firstly mentioned here by @everythingfunctional) was performing compared to the “classical” array-growth approach as @PierU has shown, after some modifications:

   implicit none
   integer, parameter :: N_MAX = 10000000
   integer :: i, n, waste = 0
   integer, allocatable :: array(:)  ! final target array.
   integer :: istart
   integer :: last_size = 1
   integer, allocatable :: tmp(:)

   allocate(array(last_size))
   istart = 1
   10 do i = istart, last_size
      array(i) = i
   enddo
   if (last_size < N_MAX) then  ! save current array batch, then grow
      call move_alloc(from=array, to=tmp)
      allocate(array(last_size*2))
      array(1:last_size) = tmp     ! memcpy ....
      deallocate(tmp)
      istart    = last_size+1
      last_size = last_size*2
      if (last_size > N_MAX) then
         waste     = last_size - N_MAX
         last_size = N_MAX
      endif
      goto 10
   endif

It turns out that, using compiling with ifort /stand /standard-semantics /Od main.f90, the array-growth approach is much faster (0.22s vs. 1.6s, factor around 7-8x, same if using /O3).
Which has quite surprised me.

Does, as stated in this answer:

A note on performance:

    1. Yes linked lists have some overhead compared with arrays

fully justify such a difference in performances?

Indeed, it remains the fact that linked-list/stack approach is the way-to-go in order to avoid any memory waste.

But it looks like that the time actually spent doing the n = {\rm ceil}({\frac{\log{N_{max}}}{\log{2}}}) memory copies (and consequent new allocations) in the “array-growth” case is lower than the time needed for the N_{max} node allocations allocate( new ) plus the time to fully traverse the list in the other case.

4 Likes

I’m not completely surprised, because array operations and copying contiguous memory chunks are efficient. On the other hand, having a linked-list with nodes containing a single integer scalar means a lot of overheads (and BTW, it’s also at least x3 the memory of a classical array, as each node has to store something like an address to the next node). The linked list approach makes real sense only if each node stores a significant amount of data (when dealing with scalars of intrinsic types, the “significant amount” can be chunk of elements instead of a single one).

In the timing program above, there is no i/o. The elements are just given by the integer values from the do-loop. While this is a good way to compare the overhead effort associated with the various approaches, it takes it out of the original context, which was reading an array of unknown length. That formatted i/o, reading one integer at a time, is typically larger than anything related to either the node allocation of a linked-list or of copying the contents of short arrays into longer arrays.

As mentioned previously, the best approach would be to change the fortran standard so that arrays of unknown length can be read directly into the target allocatable array. This eliminates all the overhead associated with the user-defined-type allocations, and overall it eliminates a memory-to-memory copy of each array element. Internally, the fortran library is already doing the effort by transferring the data from the external file or device into its i/o buffers, and then from there into the integer variable. The user-written linked list code we are talking about now is just duplicating that effort again.

I like the sentiment, but in the general case I’m not sure it can truly be integrated into the language in a backwards compatible way. For example, the following program is valid, and produces what you’d expect with the input file that follows. But what would the new rules say should happen here? What about without the allocate statement?

program example
  implicit none
  real, allocatable :: arr(:)
  real :: val
  integer :: f_unit

  allocate(arr(2))
  open(newunit = f_unit, file = "example.txt")
  read(f_unit, *) arr, val
  close(f_unit)
  print *, arr, val
end program
1.0 2.0 3.0

And just to really throw a wrench in things, the following should still “work” too.

program example
  implicit none
  real, allocatable :: arr(:)
  real :: val
  integer :: f_unit

  allocate(arr(3))
  open(newunit = f_unit, file = "example.txt")
  read(f_unit, *) arr, val
  close(f_unit)
  print *, arr, val
end program
1.0 2.0
3.0 4.0

So the question becomes, when should a read into an allocatable array stop?

An interesting discussion, but I did not see a simple and common approach listed. In a number of significant cases an inquire of the file size can be used to determine the storage size required. Even a query of the filesize in bytes can sometimes be used to calculate a good estimate of an upper bound on the amount of storage required.

So the initial size does not have to be arbitrary in a considerable number of cases without requiring multiple changes in the array sizes or an initial read, but can be exactly or approximately caculated based in file size in many cases. In the above discussions the initial array size seems to always be assumed to be a guess instead of an approximation.

Also note that an initial read just to get a count often causes input files to be cached in memory, significantly speeding up the second pass; so with files that fit in the platform’s buffers reading twice can be far cheaper than it was in the past due to the significant amount of memory many HPC platforms have.

And third – I see an overuse of slurping entire files into memory in simple cases where data can be processed in small chunks. Reading one line or dataset at a time is an obvious approach for many; but depending on your background some people find it natural to always read an entire input file into memory. When input is very large this can be very inefficient or require large-memory platforms unnecessarily. So users new to large limit-stretching datasets should sometimes be reminded reading all of a file into memory is not always the only alternative.

1 Like

In both of these examples, the array was already allocated before the read, so that case is already determined by backward compatibility. The change to the standard would only apply to an unallocated array. This case is not allowed currently, so there would be no issues of backward compatibility with previously working code.

One might want the i/o semantics to follow allocate-on-assignment semantics as closely as possible. But for a previously allocated array, that cannot be done fully without breaking backward compatibility, unless, for example, a keyword is added to the read statement that overrides traditional behavior.

Given that, the answer to the question might not still be fully determined. In simple cases, the answer would be to read all the data available in the record. For list-directed or namelist i/o, that would be until a “/”, or an EOF, or until a new variable name is encountered. That is, multiple physical records might be consumed. For unformatted i/o, it might be the length of the current record. There may be some corner cases or exceptional situations that would need to be further specified.

Doesn’t the standard already provide that a slash (“/”) terminates assignment on an array read? See Oracle Fortran 77 Guide

That would mean that it is unknown whether a particular read statement performs allocation or not until run-time. Not that that’s a deal breaker, just worth noting. This has major implications when you start to think about coarrays, although it could be forbidden for an unallocated coarray to appear in a read statement the same as it is now.

I think either choice is just as valid as the other.

  1. Read until end of record (EOR). I think this still leaves ambiguous what happens to following items in the read statement, are they read from the following record or just never read?
  2. Read until “/” or EOF.

Whichever is chosen then the other behavior just isn’t supported, and the only way to “emulate” is with the already required pattern of a double pass or leading size indicator.

I encourage anyone interested in it to continue pursuing a design, but these are the kinds of things that must be considered and specified before it can make it in to the standard. Being as it’s not on the worklist for F202Y, and I haven’t seen anything proposing to add it, that work will likely need to be done by someone not on the committee, and require a compelling argument to be considered for inclusion. That said, it would be an inspiring example that the committee is willing to accept community contributions from those willing to put in the effort. I’d be happy to sponsor it and advocate for its consideration if anyone wants to try.

1 Like

From previous discussions here, I thought that this was something that was being considered by the committee.

In any case, here are some more details about how this could work. First, the simplest case is an unformatted read of a normal file (i.e. not a stream file).

read(n) array
   ! or if a keyword approach is chosen
read(n,allocate=.true.) array

In this case, the fortran library knows what is the record length. A programmer cannot query that record length (which is also something that could be changed), but the library certainly must know the value. So the library looks at the record length (e.g. in bytes), divides by the length of each array element (in bytes), and then knows the number of elements in the record. The array(:) can then be allocated to that length, and from that point a normal transfer occurs from the file to the array. The main limitation here is that only one array allocation can occur for each read statement, and it would not be possible to have multiple read statements transfer into the same allocatable array. Should it be allowed to also have items before or after array in the list?

read(n) i, j, k, array, x, y, z

As long as array(:) is the only unallocated allocatable array in the list, then this is unambiguous. What if these other items determine the number of elements that are transferred? Should that be allowed?

read(n) k, x(1:k), array, y(1:k)

This puts more burden on the i/o library at run time, but I don’t think it is ambiguous as long as all of the lengths can be determined before the allocation step occurs.

Next comes the formatted cases. Here, it seems like either regular files or stream files could occur in the mix since the record length, if it exists, is not known by the library. Also, unformatted stream files could be treated in this same way. In all these cases, the input data must be scanned and processed in order to determine the correct allocation size. Suppose we have a list-directed read.

read(*,*) array

with an input file like

1, 2, 3*4,
5 /

There are six items, accounting for the repeat count, but the processor cannot know that until it has scanned through the data. So one implementation might be to read the items one at a time and move them into a linked list. The i/o list in this case is terminated by the “/” end of record indicator, but it could also be terminated by an EOF and the same logic would be involved. After placing the items into the linked list, the i/o library could then allocate the array to have the six elements and transfer the linked list data into the allocated array. The same questions about other items in the i/o list arise here. These other items complicate the task of the i/o library, but with the appropriate restrictions, the correct length of the single allocatable array can be determined.

For namelist i/o, the same logic would apply for the unallocated array item, but the length could also be determined by encountering a new variable name in the input file.

&input i = 2, array = 1, 2, 3*4, 5 j = 1 /

In all these cases, from the programmer’s perspective, he would include an unallocated array in the i/o list (or in the namelist block), he would execute the read statement, and afterward the array would be allocated with the correct length and contain the correct data.

I’m tacitly assuming here that the returned allocated array is always rank 1 and that it will always have a lower bound of 1. Should there be some way to change that lower bound or change the rank? Two of the missing features in fortran are the ability to change the rank and the lower bounds of an allocated allocatable array. If that feature were also added, then it would be irrelevant what convention the i/o library uses, it could easily be changed afterwards to whatever the programmer wants. Otherwise, the programmer would be required to allocate the final array of the correct rank and lower bounds, transfer the data, and then deallocate the temp array that was used in the i/o operation.