Arrays and lists

I’m sure I already know the answer to this but just in case; Fortran doesn’t have a linked list structure, does it?

Fortran arrays have their size already defined and I have data currently written to a file that I would prefer to hold in memory. However, the size of the data is unknown, growing as the model varies.

I’m assuming re-dimensioning an array would be the only way - and not a nice one!

You can use the move_alloc intrinsic to efficiently grow an array, as shown here and compared to other methods. Some Fortran implementations of linked lists are listed here.


There is also this stdlib stale PR. The main missing parts are the specs and tests. It would be nice if someone could continue/finish this development :wink:

1 Like

I feel responsible for this bit. Chetan worked on this a couple of years ago with Milan and me as supervisors. So let me indeed take the responsibility :innocent:

1 Like

There is an alternative to the move_alloc approach that uses modern Fortrans reallocate on assignment facility but I only recommend it for relatively small array increments and small arrays in general due to the performance overhead of creating temporary arrays and the chance realloc on assignment is not very robust (ie chance of internal compiler error) in a given compiler. For the OPs case it could be something like this:

real(wp), allocatable :: target_array(:)
real(wp)                    :: buffer(8)
integer :: istat
! open file
allocate(target_array(0))  ! make this zero length initially (probably not needed though)

! assume data is formated in 8 columns per record
read_loop: do
    read(iunit, *, iostat=istat) buffer(:)
    if (istat /= 0) exit read_loop
    target_array = [target_array(:), buffer(:)] ! might not need colons
end do read_loop

target_array will grow dynamically with each realloc on assignment. However, I advise you to look at using move_alloc first since it is usually safer and potentially faster (if speed is a factor in your code)

1 Like

Others have probably answered your question regarding the linked list structure, but regarding the quoted text, if you know the maximum size of the data, then it is sometimes simpler to just allocate enough memory for the maximum amount of data.

I have found that this method makes the code simpler.

For example:

In my case, I don’t know what the maximum size of a mesh will be, but since modern computers have minimum 4 GB of RAM, I just decided to let the user allocate enough memory for a maximum number of cells. As default, it is 1 Million cells, but the user can chose what she wants.

As a result of this simplification, total memory consumed was only 300 MB, and instead of a complex linked list data-structure, I was able to use a simple 1D array to store all my cells.

In my humble opinion, this also makes the code more deterministic, and reduced lot of bugs I had in my previous code that used more complex data-structures.

1 Like

A technique that I use is to read into a linked-list and then convert it to an array. That way you’re not trying guess at a maximum size, you avoid lots of reallocation and copying, and the rest of the code doesn’t have to know about the linked-list.

For example, check out the parser in rojff: src/rojff/parser · main · Brad Richardson / rojff · GitLab

Using allocatables for a linked list is fine if you only append to it, don’t have backlinks, and don’t try to insert links elsewhere. Otherwise, pointers are the correct approach.

I have found several situations where allocatables are best used for the actual entities in the linked list (or tree structure), and then pointers are used, for example, for the back links. Fortran allocatables, which act like well behaved pointers in this context, are a very nice feature of the language. Give them the target attribute, and you can do all kinds of magic.

This is my preferred approach nowadays too.

When this involves i/o (rather than computation of list elements), then this could be all simplified if the fortran language allowed i/o directly into an unallocated allocatable array. The semantics for this would be something like allocation on assignment, but with i/o rather than with assignment. The i/o library would internally convert its internal linked list to the target allocatable array, so it would remove one level of copy operations compared to what is required now with a user-defined linked list. So not only easier for the programmer, but also more efficient. A win-win from every perspective.

I thought there was some movement toward this i/o model on the standard committee, but I don’t know the current status.

I would err towards this, especially since the units I am working on (except for my laptop) have very large amounts of RAM. However, I truly have no idea how large this gets.

The standard way of working with this has been to output the data to a file, and I was hoping to remove the need for such a file until the whole model had finished. However, I think the gains in terms of potential time saved through IO operations would be hugely outweighed by the losses of implementing a memory approach.

Thanks to you, and everyone else, for the helpful statements.

1 Like

There are few ways to handle uncertain amount of data.

I like to use these techniques.

  1. Keep a counter. If possible, I write out the number of data elements that are written to the file so that I can simply allocate the required amount when I need to read it.

  2. Read through the file, and count the number of elements required. Disk IO is slow, but not too much slow, if it only needs to be done once in a while. For reading a STL mesh, I simply read through the first word of each line to count the number of lines that start with “facet”. This is the no. of cells that I needed to allocate.

  3. Allocate more than enough memory at once. If I expect to read an unknown number of cells, I will just allocate a max amount of memory for 1 Million cells.

  4. Reallocation is slow, but not that much slow if it’s done once in a while. If even after 1 Million cells, I need to read more data, it’s not an issue to simply reallocate more memory.

  5. Use a buffer to read data. Read data into a temporary buffer array, and store it into a bigger array, or a linked list of arrays. It’s more complex, so I don’t use it.

1 Like

I like your novel approach of using wc -l to read the number of lines, but that might not work on windows or other operating systems.

I just open the file, and count the number of lines until it hits EOF (end of file). It works everywhere.

You are completely right. It will not work in Windows.

This is from my own library, and as long as I only use linux, it works for me. I guess there are similar choices for Windows, but I am not aware of them at the moment

One can install GNU coreutils on a Windows PC and have access to wc and other Unix commands.

1 Like

@Beliavsky I did not know this. Very good

I deleted my post while editing them. The piece of code is this

  integer function unix_number_of_lines(filecheck)
    implicit none

    integer :: uu
    character(len=char_max_length), intent(in) :: filecheck
    character(len=char_max_length) :: fifofile='/tmp/tov_fifo'

    logical :: exi

    inquire (exist=exi,file=fifofile)
    if (.not.exi) call execute_command_line("mkfifo "//TRIM(fifofile))
    call execute_command_line("wc -l "//TRIM(filecheck)//" | cut -f1 -d' '>"//TRIM(fifofile)//"&")
    open (newunit=uu,file=TRIM(fifofile),action='read')
    read(uu,*) unix_number_of_lines ; close(uu)

  end function unix_number_of_lines

The general strategy can be same as the one of C++ vectors, i.e. iteratively doubling the size when needed:

allocate( a(1000) ) ! or whichever appropriate initial size
i = 0
   if (end_of_data_source()) exit
   i = i+1
   if (i > size(a)) then
      allocate( tmp(2*size(a)) )
      tmp(1:size(a)) = a(:)
      call mv_alloc(tmp,a)
   end if
   a(i) = ... ! fill the ith element
end do
a = a(1:i)   ! reallocation on assignment

There is another situation that applies mostly to counting characters within files rather than integer or real values. There is now a size= keyword argument in the inquire() statement. This queries the filesystem to get its value rather than actually counting the characters, so it may not be exactly what the programmer needs (e.g. it counts newline characters in formatted files, and embedded record length data in unformatted files), but it requires almost no effort to get the number. You can also extract the exact array length from the file size when you have a fixed-record-length file full of characters or fixed-width integer or real values, so there are actually a lot of situations where this approach is useful.

I would say that this is also a simple solution, no more complicated than the others in the list. The downside is that there may be a large number of allocations required. If you allocate each value individually, then the number of allocations would be the array length. As you say, you can also allocate in buffer-sized batches (either fixed-length, or variable-length), and that does reduce the number of allocations at the expense of a little more logic. I have found in many cases the overhead depends more on the number of allocations rather than the amount of data that is allocated; that is, the cost for allocating a scalar entity is the same as allocation of a small array. If this logic were built into the i/o library as part of the fortran standard (i.e. READ into an unallocated array), then it would almost certainly be the method of choice for this task.

This should indeed be equivalent to std::vector (as long as mv_alloc is free) except the last line a = a(1:i), which is one extra copy. Is there a way to avoid it?