As I understand, you need to read the whole file once to count the number of values N, then to come back to the beginning of the file to read the values ? If yes you have two options:
close the file after the first pass and reopen it
use the rewind statement
Another option is to overallocate the arrays. If you can decide an upper bound Nmax, just allocate the arrays to Nmax elements, and at the end of reading the values reallocate them:
allocate( nodes_n1(Nmax), nodes_n2(Nmax), node_n3(Nmax) )
N = 0
read_loop_nodes: do
read (mesh_unit_file, '(A)',iostat=retcode) line
if ( (retcode/=iostat_end).and.(line(1:1)/="*")) then
N = N+1
read(line,*) nodes_n1(N), nodes_n2(N), nodes_n3(N)
else
...
end if
...
end do
nodes_n1 = nodes_n1(1:N)
nodes_n2 = nodes_n2(1:N)
nodes_n3 = nodes_n1(3:N)
Note that all commonly used systems todays use virtual memory: you can allocate very large amounts of memory, but it remains “virtual” as long as the elements are not accessed. A physical page is attributed in RAM only when an element of the page is first written. This means that you can allocate a 100GB array with a very large Nmax, and if you are writing only the first element it will occupy only 4kB of physical RAM (4kB being the size of a page).
The other possibility is creating a linked list to temporarily store the elements during the reading phase, but this is a bit cumbersome IMO.
There is a possible mix between overallocation and reallocations, if you don’t want to allocate a very large Nmax from scratch. This is a simulation of what the C++ vector does. I illustrate it for nodes_n1 only:
Nmax = 1000 ! start with a reasonnable Nmax
allocate( nodes_n1(Nmax), nodes_n2(Nmax), node_n3(Nmax) )
N = 0
read_loop_nodes: do
read (mesh_unit_file, '(A)',iostat=retcode) line
if ( (retcode/=iostat_end).and.(line(1:1)/="*")) then
N = N+1
if (N > Nmax) then
! reallocations occur only when N gets over Nmax
allocate( tmparray(2*Nmax) )
tmparray(1:Nmax) = nodes_n1(:)
call mv_alloc( tmparray, nodes_n1 )
Nmax = size( nodes_n1 )
end if
read(line,*) nodes_n1(N), ...
else
...
end if
...
end do
nodes_n1 = nodes_n1(1:N)
nodes_n2 = nodes_n2(1:N)
nodes_n3 = nodes_n1(3:N)```
This is not specified by the standard, and this is managed by the OS. With the commonly used OS’s, just opening a file does not load any of the content in RAM.The parts of the files that are explicitly read afterwards are cached in RAM, at least as long as:
the file is not closed (so, the rewind option is actually better than closing/reopening the file)
the OS does not need to reclaim the space occupied by the cache
If your file is not too big, it is reasonable to assume that it will stay in cache between the 2 passes, therefore that the 2nd pass will be much faster than than the 1st pass
How many times do you need to reuse the file and how locked in are you to the file format? If you can change the format of the file other alternatives from HDF5 to binary files are available. If the files are used many times reading them and converting them to a file format that can be processed more efficiently may be desirable. If you cannot change the format and only read the file a few times there might not be as much of an advantage to such an approach. Big is a relative turn. Are these file sizes in the Gigabytes or larger or a few megabytes at most? How much time does it currently take to read a file and how often do you have to read one? Your current approach might be inefficient but it is only an academic exercise to improve it unless it currently is (or likely to be in the future) taking significant time or resources; in which case going to a higher-performing more easily consumed file format may well be worth the effort; as the root cause of the problem is the file format is not easy to access efficiently.