There is an alternative to the move_alloc approach that uses modern Fortrans reallocate on assignment facility but I only recommend it for relatively small array increments and small arrays in general due to the performance overhead of creating temporary arrays and the chance realloc on assignment is not very robust (ie chance of internal compiler error) in a given compiler. For the OPs case it could be something like this:
real(wp), allocatable :: target_array(:)
real(wp) :: buffer(8)
integer :: istat
! open file
allocate(target_array(0)) ! make this zero length initially (probably not needed though)
! assume data is formated in 8 columns per record
read(iunit, *, iostat=istat) buffer(:)
if (istat /= 0) exit read_loop
target_array = [target_array(:), buffer(:)] ! might not need colons
end do read_loop
target_array will grow dynamically with each realloc on assignment. However, I advise you to look at using move_alloc first since it is usually safer and potentially faster (if speed is a factor in your code)
Others have probably answered your question regarding the linked list structure, but regarding the quoted text, if you know the maximum size of the data, then it is sometimes simpler to just allocate enough memory for the maximum amount of data.
I have found that this method makes the code simpler.
In my case, I don’t know what the maximum size of a mesh will be, but since modern computers have minimum 4 GB of RAM, I just decided to let the user allocate enough memory for a maximum number of cells. As default, it is 1 Million cells, but the user can chose what she wants.
As a result of this simplification, total memory consumed was only 300 MB, and instead of a complex linked list data-structure, I was able to use a simple 1D array to store all my cells.
In my humble opinion, this also makes the code more deterministic, and reduced lot of bugs I had in my previous code that used more complex data-structures.
A technique that I use is to read into a linked-list and then convert it to an array. That way you’re not trying guess at a maximum size, you avoid lots of reallocation and copying, and the rest of the code doesn’t have to know about the linked-list.
I have found several situations where allocatables are best used for the actual entities in the linked list (or tree structure), and then pointers are used, for example, for the back links. Fortran allocatables, which act like well behaved pointers in this context, are a very nice feature of the language. Give them the target attribute, and you can do all kinds of magic.
When this involves i/o (rather than computation of list elements), then this could be all simplified if the fortran language allowed i/o directly into an unallocated allocatable array. The semantics for this would be something like allocation on assignment, but with i/o rather than with assignment. The i/o library would internally convert its internal linked list to the target allocatable array, so it would remove one level of copy operations compared to what is required now with a user-defined linked list. So not only easier for the programmer, but also more efficient. A win-win from every perspective.
I thought there was some movement toward this i/o model on the standard committee, but I don’t know the current status.
I would err towards this, especially since the units I am working on (except for my laptop) have very large amounts of RAM. However, I truly have no idea how large this gets.
The standard way of working with this has been to output the data to a file, and I was hoping to remove the need for such a file until the whole model had finished. However, I think the gains in terms of potential time saved through IO operations would be hugely outweighed by the losses of implementing a memory approach.
Thanks to you, and everyone else, for the helpful statements.
There are few ways to handle uncertain amount of data.
I like to use these techniques.
Keep a counter. If possible, I write out the number of data elements that are written to the file so that I can simply allocate the required amount when I need to read it.
Read through the file, and count the number of elements required. Disk IO is slow, but not too much slow, if it only needs to be done once in a while. For reading a STL mesh, I simply read through the first word of each line to count the number of lines that start with “facet”. This is the no. of cells that I needed to allocate.
Allocate more than enough memory at once. If I expect to read an unknown number of cells, I will just allocate a max amount of memory for 1 Million cells.
Reallocation is slow, but not that much slow if it’s done once in a while. If even after 1 Million cells, I need to read more data, it’s not an issue to simply reallocate more memory.
Use a buffer to read data. Read data into a temporary buffer array, and store it into a bigger array, or a linked list of arrays. It’s more complex, so I don’t use it.
The general strategy can be same as the one of C++ vectors, i.e. iteratively doubling the size when needed:
allocate( a(1000) ) ! or whichever appropriate initial size
i = 0
if (end_of_data_source()) exit
i = i+1
if (i > size(a)) then
allocate( tmp(2*size(a)) )
tmp(1:size(a)) = a(:)
a(i) = ... ! fill the ith element
a = a(1:i) ! reallocation on assignment
There is another situation that applies mostly to counting characters within files rather than integer or real values. There is now a size= keyword argument in the inquire() statement. This queries the filesystem to get its value rather than actually counting the characters, so it may not be exactly what the programmer needs (e.g. it counts newline characters in formatted files, and embedded record length data in unformatted files), but it requires almost no effort to get the number. You can also extract the exact array length from the file size when you have a fixed-record-length file full of characters or fixed-width integer or real values, so there are actually a lot of situations where this approach is useful.
I would say that this is also a simple solution, no more complicated than the others in the list. The downside is that there may be a large number of allocations required. If you allocate each value individually, then the number of allocations would be the array length. As you say, you can also allocate in buffer-sized batches (either fixed-length, or variable-length), and that does reduce the number of allocations at the expense of a little more logic. I have found in many cases the overhead depends more on the number of allocations rather than the amount of data that is allocated; that is, the cost for allocating a scalar entity is the same as allocation of a small array. If this logic were built into the i/o library as part of the fortran standard (i.e. READ into an unallocated array), then it would almost certainly be the method of choice for this task.