I would like to ask a group of questions that may be a bit general.
What does it mean exactly for a library to be thread-safe?
How to achieve thread safety in (modern) Fortran libraries? What constructions/features are in general not thread-safe and should be avoided? I suppose that the answer may depend on the implementation of the compilers.
How to verify whether a Fortran library is thread-safe? Is there a way of designing particular tests that can serve this purpose? How to verify it if we have access to the source code?
Thank you very much for your input and insights.
Update (2022-01-24): For clarity, let us assume that we consider only F2003 or above together with reasonably recent releases (e.g., after 2010) of “mainstream” compilers (e.g., gfortran, LFortran, ifort, nagfor, Classic flang, AOCC flang, Bisheng flang, nvfortran, absoft, IBM, Cray, …). Thank you.
It means that any data that can be accessed by multiple processors simultaneously is appropriately and correctly accessed, such that the results remain valid in parallel.
Any Fortran variable with the save attribute (whether declared in a module or inside procedures) can be an issue for thread safety (by default, module variables have the save attribute).
The ultimate method is likely writing tests that verify the accuracy of the code output in parallel. Intel products (formerly collectively called Intel parallel studio) might help identify thread-safety issues, but I am not sure. I decided to avoid thread-safety concerns by implementing all parallelized sections of applications via MPI or Coarrays, where each process has its unique isolated data. On supercomputers that I have tested, MPI communications take on the order of a microsecond which, I believe, is comparable to the overhead of multithreading.
Intel has an excellent summary of parallelization tips and thread-safety for Fortran applications.
Thank you @shahmoradi for the explanation and link.
If a library never uses module variables or any other variables with the save attribute, is it guaranteed to be thread-safe? Or do there exist other possibilities of thread unsafty?
Officially, that will depend on the compiler implementation. In the old days, some compilers used to put local variables in static memory, since FORTRAN 77 did not allow recursion. Before Fortran 2018 routines had to be declared recursive to make sure that the local variables were taken from the stack, a necessary condition for recursion to work and recursion in that respect is quite akin to multithreading: several copies of a routine may be active at the same time.
The issue of thread-(un)safety would remain if you cannot guarantee that any subprograms called are also thread-safe. For this purpose the NAG library will usually provide information if a routine is a thread-safe, see also Thread Safety.
Gfortran also has a section on Thread-safety of the runtime library. Calls to the intrinsics get_environment_variable, execute_command_line, or the GNU extensions getenv and system are potentially unsafe.
Thread safety is elimination by construction/design of the possibility that one process may attempt to access or modify a variable at the same time that another process is modifying it (a data race) or encounter a deadlock (i.e. two or more processes are waiting for the other(s) to perform some action). In the absence of openmp and coarrays, Fortran is inherently thread safe. When running multiple images, every image has it’s own copy of local data, and so no possibility exists of one trying to access another’s data. Images can only access each other’s data through coarrays - which one should take care to carefully coordinate - or through collective subroutines, which the standard dictates must exhibit the proper synchronization. OpenMP is where one could unintentionally introduce non-thread safe code. In that case there will be multiple processes accessing any saved data.
So, the basic takeaways are that any code that does not access any saved data and does not have any coarrays is inherently threadsafe. If it does access any saved data, it should not be called from openmp parallel sections.
Thank you, @everythingfunctional , for the elaboration. The criterion quoted above is quite informative and fairly easy to verify for open-source libraries. I guess such verification is automatable, right? Many thanks!
Thank you @Arjen for the explanation, in particular for the comparison with recursion. This is insightful.
Is it correct if I claim that “recursion safety” and thread safety are equivalent — ensuring one of them will imply the other? If not, which one is stronger? Many thanks!
A practical issue I found when designing thread-safe (object-oriented) libraries is that you want objects without mutable state.
Entering a thread parallel region with a mutable object requires to create a copy for each thread, and a possibility to synchronize the state of the copies again later. Copying can also be difficult, because a plain memcopy or even a deep-copy can be insufficient to create truly independent instances of a mutable object (think of shared resources, pointer components, connected units).
In practice OpenMP provides almost no real mechanism to deal with mutable objects, meaning you have to declare the original instance shared among the threads plus an additional local version and perform the copy manually. This is solved somewhat better in OpenACC which can copy data from type components and doesn’t blindly memcopy like an OpenMP firstprivate clause would.
Thank you @awvwgk for the insight. I have rather limited experience in OOP and no experience in parallel computing with Fortran. From my superficial understanding, what you said means that mutable objects, like variables with the saved attribute, are most likely thread-unsafe, so they are better avoided if thread safety is desired. Is this right? Many thanks!
100 times this. If an object never changes state after it has been constructed/defined, there’s only one place you have to look if at some point it doesn’t have the value you expected. The guarantee that it can be used in parallel code safely is an added bonus. A variable can be in one of four states
private and immutable (this is the easiest code to understand)
shared and immutable (this is inherently threadsafe)
private and mutable (this is the state of most Fortran variables)
shared and mutable (this is where problems occur)
I recommend doing whatever you can to avoid that last state.
Others have provided good answers, but I just remembered the issue of random number generation, which appear to be thread-safe nowadays, although not a good idea to call it intensively in a parallel section (Based on the website description, I think gfortran serializes the call to random_number(). Steve Kargl and others more experienced with gfortran than me would likely know better).
No, there is an important difference between the two: recursive code can safely change static data - think of a counter that counts the number of recursion levels for instance - because there is only one copy actually running. All copies will have access to the static data, but changing such data is strictly sequential. The main thing about recursion-enabled code is that for local/automatic data you get your own copy.
With multithreaded code you have to be careful, because multiple copies will be working at the same time. So, keeping a counter or the like has to be done within guarded sections - via mutexes for instance.