Parallel execution of functions with side effects

MarDie · November 30, 2022, 11:24pm

This question might be specific to the Intel compiler when using the -parallel/-Qparallel option, but I was wondering what happens in the case that two functions on the right hand side write to the same global data.
My minimum working example is:

program test                                                                                        
                                                                                                    
  implicit none                                                                                     
  integer :: state, r                                                                               
                                                                                                    
  state = 0                                                                                         
                                                                                                    
  r = a() - b()                                                                                     
  if (r /= 0) error stop 'race condition'                                                           
  print*, 'all fine'                                                                                
                                                                                                    
contains                                                                                            
                                                                                                    
real function a()                                                                                   
  state = 1                                                                                         
  a = state*2                                                                                       
end function a                                                                                      
                                                                                                    
                                                                                                    
real function b()                                                                                   
  state = 2                                                                                         
  b = state                                                                                         
end function b                                                                                      
                                                                                                    
end program test

Note that this is the well behaved case because both functions set the global variable explicitly. It is inspired from FFTW. A more error prone case would be:

program test                                                                                        
                                                                                                    
  implicit none                                                                                     
  integer :: state, r                                                                               
                                                                                                    
  state = 0                                                                                         
                                                                                                    
  r = a() - b()*2                                                                                   
  if (r /= 0) error stop 'race condition'                                                           
  print*, 'all fine'                                                                                
                                                                                                    
contains                                                                                            
                                                                                                    
real function a()                                                                                   
  state = 1                                                                                         
  a = state * 2                                                                                     
end function a                                                                                      
                                                                                                    
                                                                                                    
real function b()                                                                                   
  b = state                                                                                         
end function b                                                                                      
                                                                                                    
end program test

FortranFan · November 30, 2022, 11:37pm

You won’t go wrong by thinking it will be unspecified behavior and nondeterministic program state and outcome under the circumstances.

everythingfunctional · November 30, 2022, 11:45pm

From Annex C:

If more than one function reference appears in a statement, they can be executed in any order (subject to a function result being evaluated after the evaluation of its arguments) and their values cannot depend on the order of execution. This lack of dependence on order of evaluation enables parallel execution of the function references.

Thus, it seems to me the program is non-conforming from the start.

MarDie · November 30, 2022, 11:58pm

In case of logical functions, gfortran even warns that non-pure functions might not be executed, i.e. if one has an .and. connection is suffices to get one false to skip execution of all other functions.

MarDie · November 30, 2022, 11:59pm

@everythingfunctional: I think the first program is still valid, because execution order does not matter. Only parallel execution causes a problem.

CRquantum · December 1, 2022, 3:16am

From speed point of view, at least for Intel Fortran, in my very limited experiences, this option seems always make code slower.
For one example,
If I enable -parallel/-Qparallel for the FLINT ode solver,

its speed will be 10 times slower. Without -parallel/-Qparallel, its speed is normal.

everythingfunctional · December 1, 2022, 5:06pm

If you want to get technical about it, I actually don’t see how the function a in your examples could ever return a value other than 2, even when executed in parallel. state is always defined to the same value before being used, so even if one thread performs the assignment while/before the other is using the value in the following expression, it’s not like it will change the result of function since it’s assigning the same value the other thread already assigned to it. So I guess your example is standards conforming, and deterministic in parallel actually.

Perhaps you could provide an example where serial execution in any order would obtain a consistent result, but parallel execution would not?

MarDie · December 1, 2022, 5:37pm

I think there is a race condition in the first example: function a sets state = 1 and function b sets state = 2. Both return values depend on state, so there is a theoretical chance that one of the functions computes its return value based on the write of the other function.

Of course, for a single real this will not happen but for large arrays (my actual use case are 3D FFTs) reading and writing takes a measurable amount of time.

everythingfunctional · December 1, 2022, 6:59pm

You’re right, I did not read that closely enough. Sorry

RonShepard · December 2, 2022, 6:49am

In this example, the functions could be rewritten as

MarDie:

real function a()                                                                                   
  state = 1                                                                                         
  a = 2                                                                                       
end function a                                                                                      
                                                                                                    
                                                                                                    
real function b()                                                                                   
  state = 2                                                                                         
  b = 2                                                                                        
end function b

This eliminates any race conditions for the function values, while leaving the race conditions for the final value of state. Unless state is given the volatile attribute in the original code, this is probably the way a compiler would evaluate the functions anyway.

aledinola · December 5, 2022, 11:21pm

That’s a good point. I also observed that in practice the option /Qparallel slows down the code in ifort. I typically use only /Qopenmp. What is /Qparallel supposed to do?

CRquantum · December 10, 2022, 9:33am

Perhaps the below 3 links may help, all mentioned /qparallel a little bit.

I think /qparallel is more or less a compiler flag, a lot of compiler engineers’ work is devoted in this flag. It is aiming at intelligently and automatically parallelize some loops for you, but in reality not always work very well. I believe if you can use openMP, that will be more efficient than /qparallel. After all, you know your code better than the compiler
If I remember correctly, you may enable the /qparallel and also the optimziation report to level 5, then in visual studio (with Intel OneAPI), you will see how much speedup it gains for each loops.

In addition, if I remember correctly, if you use /qparallel on a, say, 10 core machine, and if you use call cpu_time() to do timing, that time will be the real wall time multiply by 10. So if your code finished in 1 second, call cpu_time will show you 10 seconds.

Topic		Replies	Views
Gfortran with do concurrent for windows 10 Help	8	996	August 27, 2023
Does /qparallel automatically parallelize something?	2	504	October 15, 2021
Pure functions not deterministic in serial and parallel loops	20	1618	November 9, 2022
Thread safety of modern Fortran libraries: what and how? Help	16	2444	January 25, 2022
A Model For Parallel Testing In Fortran Announcements	5	512	December 4, 2020

Parallel execution of functions with side effects

Related topics