Parallel execution of functions with side effects

This question might be specific to the Intel compiler when using the -parallel/-Qparallel option, but I was wondering what happens in the case that two functions on the right hand side write to the same global data.
My minimum working example is:

program test                                                                                        
                                                                                                    
  implicit none                                                                                     
  integer :: state, r                                                                               
                                                                                                    
  state = 0                                                                                         
                                                                                                    
  r = a() - b()                                                                                     
  if (r /= 0) error stop 'race condition'                                                           
  print*, 'all fine'                                                                                
                                                                                                    
contains                                                                                            
                                                                                                    
real function a()                                                                                   
  state = 1                                                                                         
  a = state*2                                                                                       
end function a                                                                                      
                                                                                                    
                                                                                                    
real function b()                                                                                   
  state = 2                                                                                         
  b = state                                                                                         
end function b                                                                                      
                                                                                                    
end program test

Note that this is the well behaved case because both functions set the global variable explicitly. It is inspired from FFTW. A more error prone case would be:

program test                                                                                        
                                                                                                    
  implicit none                                                                                     
  integer :: state, r                                                                               
                                                                                                    
  state = 0                                                                                         
                                                                                                    
  r = a() - b()*2                                                                                   
  if (r /= 0) error stop 'race condition'                                                           
  print*, 'all fine'                                                                                
                                                                                                    
contains                                                                                            
                                                                                                    
real function a()                                                                                   
  state = 1                                                                                         
  a = state * 2                                                                                     
end function a                                                                                      
                                                                                                    
                                                                                                    
real function b()                                                                                   
  b = state                                                                                         
end function b                                                                                      
                                                                                                    
end program test 

You won’t go wrong by thinking it will be unspecified behavior and nondeterministic program state and outcome under the circumstances.

From Annex C:

If more than one function reference appears in a statement, they can be executed in any order (subject to a function result being evaluated after the evaluation of its arguments) and their values cannot depend on the order of execution. This lack of dependence on order of evaluation enables parallel execution of the function references.

Thus, it seems to me the program is non-conforming from the start.

1 Like

In case of logical functions, gfortran even warns that non-pure functions might not be executed, i.e. if one has an .and. connection is suffices to get one false to skip execution of all other functions.

@everythingfunctional: I think the first program is still valid, because execution order does not matter. Only parallel execution causes a problem.

From speed point of view, at least for Intel Fortran, in my very limited experiences, this option seems always make code slower.
For one example,
If I enable -parallel/-Qparallel for the FLINT ode solver,

its speed will be 10 times slower. Without -parallel/-Qparallel, its speed is normal.

1 Like

If you want to get technical about it, I actually don’t see how the function a in your examples could ever return a value other than 2, even when executed in parallel. state is always defined to the same value before being used, so even if one thread performs the assignment while/before the other is using the value in the following expression, it’s not like it will change the result of function since it’s assigning the same value the other thread already assigned to it. So I guess your example is standards conforming, and deterministic in parallel actually.

Perhaps you could provide an example where serial execution in any order would obtain a consistent result, but parallel execution would not?

I think there is a race condition in the first example: function a sets state = 1 and function b sets state = 2. Both return values depend on state, so there is a theoretical chance that one of the functions computes its return value based on the write of the other function.

Of course, for a single real this will not happen but for large arrays (my actual use case are 3D FFTs) reading and writing takes a measurable amount of time.

You’re right, I did not read that closely enough. Sorry

In this example, the functions could be rewritten as

This eliminates any race conditions for the function values, while leaving the race conditions for the final value of state. Unless state is given the volatile attribute in the original code, this is probably the way a compiler would evaluate the functions anyway.

That’s a good point. I also observed that in practice the option /Qparallel slows down the code in ifort. I typically use only /Qopenmp. What is /Qparallel supposed to do?

1 Like

Perhaps the below 3 links may help, all mentioned /qparallel a little bit.

I think /qparallel is more or less a compiler flag, a lot of compiler engineers’ work is devoted in this flag. It is aiming at intelligently and automatically parallelize some loops for you, but in reality not always work very well. I believe if you can use openMP, that will be more efficient than /qparallel. After all, you know your code better than the compiler :slight_smile:
If I remember correctly, you may enable the /qparallel and also the optimziation report to level 5, then in visual studio (with Intel OneAPI), you will see how much speedup it gains for each loops.

In addition, if I remember correctly, if you use /qparallel on a, say, 10 core machine, and if you use call cpu_time() to do timing, that time will be the real wall time multiply by 10. So if your code finished in 1 second, call cpu_time will show you 10 seconds.

1 Like