If the variable “IO_intAsStr” could be a local variable placed on the thread stack ( ie not on the shared heap) then the problem would not occur.
character (len=11) ::IO_intAsStr should work ?
As mentioned by others, it’s worth putting I/O inside a named CRITICAL section.
This is an example of how we do it in our code:
Writing:
!$OMP CRITICAL(u206)
inquire(iolength=recl) vkl(1:3,ik),nstsv,evecsv
open(206,file=fname,form='UNFORMATTED',access='DIRECT',recl=recl)
write(206,rec=ik) vkl(1:3,ik),nstsv,evecsv
close(206)
!$OMP END CRITICAL(u206)
Reading:
!$OMP CRITICAL(u206)
inquire(iolength=recl) vkl_,nstsv_,evecsv
open(206,file=fname,form='UNFORMATTED',access='DIRECT',recl=recl)
read(206,rec=ik) vkl_,nstsv_,evecsv
close(206)
!$OMP END CRITICAL(u206)
We name the critical section after the unit (rather than having an unnamed critical section which would block everything). Also, the code is written so that reading and writing to the same record at the same time will never occur.
This has never failed in over a decade of running on many HPC clusters. The only issue is that it can sometimes hang the code on NFS file systems. However, HPC clusters should probably not use NFS.
In the days before allocatable scalars were allowed I used to do this. Of course it required the function i0long(n) to be evaluated twice, but that was never a significant part of the run time of my programs using it.
Would it help with the problems of either OMP or multiple threads?
module i0mod
implicit none
private
public i0
contains
pure function i0long( n) result (out)
integer,intent(in)::n
character(range(n)+2) out
write(out,"(I0)") n
end function i0long
pure function i0( n) result (out)
integer,intent(in)::n
character(len_trim(adjustl(i0long(n)))) out
out = trim(adjustl(i0long(n)))
end function i0
end module i0mod
I was recently devastated to learn that there is seemingly no way to get parallel file reads in pure Fortran using the gfortran compiler.
Opening files with access=’stream’, form=’unformatted’, action=’read’ made no difference. Gfortran’s runtime will crash the instant you attempt to open the same file on more than one logical unit number, even with open(newunit=lun,… inside an !$omp crticial region. Trying to open the file a single time and execute multiple read(unit=lun, pos=<calculated_chunk_offset) <destination_variables> just get queued to run sequentially by Gfortran’s runtime.
Asking multiple different AI for an explanation, I learned that this is because 1) the Fortran standard declares a file shall not be opened on more than 1 logical unit number, and 2) Gfortran has chosen strict adherence to that, as well as a locking design in libgfortran so that even the multiple read statements at different pos= values all must happen sequentially as well.
This is rather unfortunate, because I am certain that ifort used to allow opening the same file on multiple unit numbers. Even before newunit=, there were codes I worked with, that have been running longer than I have been alive, and relied on opening the same file on multiple LUN to read and use different parts of it in different locations simultaneously.
A suggested escape hatch was to leave Fortran via iso_c_binding and use POSIX open, pread, close to interact with the file. Sadly, that will only work on Mac OS + Linux (and any POSIX OS I suppose), but specifically will not work on Windows.
I’m puzzled by this… isn’t the SHARE or SHARED specifier meant to allow this? Admittedly I’ve used it successfully on multiprocessing with MPI with ifort and gfortran, not on multithreading, but would have expected it to also enable parallel read in that case.
The SHARE specifier allows system-level locking on a unit upon opening it for controlled access from multiple processes/threads. The SHARE specifier has several forms:
OPEN(…, SHARE=sh)
OPEN(…, SHARED)
OPEN(…, NOSHARED)
Oooh, that is exciting, I will give it a whirl tomorrow and see if it works. Generally I’d prefer to not use extensions, mostly for fear of compatibility in long lived codes, but since it’s the same for gfortran and ifort maybe this is more acceptable.
The description certainly makes it sound like opening the file ‘SHARED‘ should allow multiple simultaneous reads in an openmp parallel do. I’ll edit this post with a follow up after checking it out.
UPDATE: Sadly, SHARED did not work. I’ve been using gfortran10, so perhaps newer versions behave differently.
What about using the standard C functions for opening and accessing the files, instead of using the posix ones?
It used to. Since F2018 it is processor-dependent whether it is allowed or not.
If you are only reading, an operating system solution is to use aliases to the file (“symbolic links” or “shortcuts”) and have each thread open through a different filename.
Ok, I like the sound of both of these. Perhaps newer versions of gfortran (I’m using gfortran10 from 2020) will behave differently.
Creating symlinks would be another option, albeit again excluding Windows users, but that is a nice work around for a small system call.
UPDATE: Symbolic links do not work for this case. The runtime still knows it’s the same file and produces the same error about not opening same file on multiple units. This remains the same whether I open the links SHARED or not.
Apparently, there are 3 kinds of aliases in Windows (plain, \d, and \h).
So far I’ve only tried on Linux, and the gfotran10 libgfortran runtime was able to see through my deception using both symlinks or hard links.
Other users in this thread made it sound like multiprocessing may work, so I’ll have to look into that.
With stream access, one could modify the position of the file prior to every read statement and query its position afterwards. That would allow different threads to keep track of their own position within the single file. This still requires serialization of the i/o of course, but it might work for, say, up to 16 or 32 threads. If you want a million threads/processors to access the same file, then this approach might not work.
That was the approach I took, and the way that libgfortran was handling it, at least in gfortran10, was to run all the READ statements serially. This happens even if each statement is occurring in a different OpenMP thread, with completely non-overlapping POS and output variable size. So, performance was identical to a single READ of the whole file at once.
That’s why I went to trying to additionally open the file multiple times on different unit numbers, since I already know I can handle the position and get all the data I want in separate chunks.
So far the only approach that has somewhat worked is manually splitting the file into truly uniquely named separate files, and then using the approach of opening uniquely named files on separate unit numbers to read everything into my desired variables where each chunk belongs.
I’m guessing that this is going to be the situation for any “normal” file system. I think you would need an underlying parallel file system in order to actually allow simultaneous access to the file. Here is one example: https://www.lustre.org
this makes sense to me for doing actual IO. But I was referring to internal IO, i.e. using write to store data in variables.
A few more thoughts, but a lot more questions about how I/O bound your
application is and what access patterns there are and what platforms
you run on.
What hardware the file resides on, what platforms it has to support,
as well as how big it is and how large and frequent the I/O requests
are all matter.
Traditionally parallel I/O in Fortran often uses “MPI-IO”-based libraries
such as Parallel HDF5, or the NCAR Parallel I/O (PIO) library and a
supported file system such as Lustre over Infiniband (NFS is typically
not supported). This also typically requires appropriate band sizes used
when creating the file to be effective.
If the file is small copying it to a memory-resident file system such as
/dev/shm on most Linux/Unix systems or slurping it into an array when
using threads, or using other I/O methods such as asynchronous I/O or
reading it in and splitting it into multiple files prior to using the
data and using different devices for read-only files and write-only
files can all reduce or eliminate I/O bottlenecks as well.
You can also adjust a multitude of cache sizes and buffering
options on the platforms using compiler-specific options to affect I/O
performance. So knowing the size of the file, whether different processors
will only be accessing specific file sections or whether each one may
require data randomly appearing in the file all steer what approaches
might be useful.
How much memory is available for data storage and how much time is
being used in I/O operations can all effect how or even if I/O tuning
is worth it. Modern Unix and Linux systems and compilers often cache
data so extensively now that memory is so much more abundant than in the
past that it is much much rarer for applications to truly be I/O bound
than in even the recent past. The cached data can even persist across
multiple program executions.
Since Coarray Fortran often uses underlying libraries such as MPI
theoretically you may find that gives you automatic access to MPI-IO
parallel I/O., Does anyone know definitively if that is the case with
gfortran?
The most basic question is is the file small enough to slurp into
memory? If too big, is each processor accessing a specific section
of data? Is it feasible to use HDF5 or are you bound to an existing
file structure?
My solution for I/O on OpenMP with 12-24 threads has been to open an unformatted sequential file for each thread and save the info for re-processing at the end of the !$OMP region. (My solution allows reviewing each file at the end of this OMP phase, enclosed in !$omp critical)
An alternative could be to open a storage buffer on each thread stack, but this info is not available to other threads.
If the requirements are more complex, then by using a single interface routine to a single file or a single allocated heap buffer, where each use of this single routine access is enclosed in !$omp critical.
I have not found using !$omp critical to cause delays, as the I/O processing is a small proportion of calculation time, so threads queuing for I/O is not an issue. With small I/O packets, there are some benefits to keeping threads aligned, such as cache sharing between threads for shared arrays. (a problem I find with hyper-threading)
Have a look at gfortran bug 113797. It might be related. Maybe this bug is not triggered in the function itself but where IO_intAsStr is used (in a concatenation operations)?
thanks, could indeed be the case.
I tried guarding only internal IO with an OMP CRITICAL and that did not work.
Same for me with gfortran, I’ve tried to move the internal IO and the concatenation operations in an “OMP CRITICAL” block without success.