Thread-safe initialization of shared module data (OpenMP)

nshaffer · December 15, 2020, 5:04pm

I’ve inherited an OpenMP-threaded codebase that has a module looking something like this

module m
  implicit none

  public
  
  real, private :: table(100)
  logical, private :: table_init
  !$OMP threadprivate(table, table_init)

contains

  subroutine init_table
    integer :: i

    table(1) = 1.0
    do i = 2, size(table)
      table(i) = real(i)*table(i-1)/real(128)
    end do
    table_init = .true.
  end subroutine

  real function f(n, x) 
    integer, intent(in) :: n
    real, intent(in) :: x

    if (.not. table_init)  call init_table

    f = table(n)*x**n
  end function 
end module m

That is, there is a function which uses a lookup table stored as a module variable, along with a flag to keep track of whether or not it’s been initialized. Thread safety is maintained by giving each thread a private copy of the lookup table.

However, in this example as well as in the real code, the table is not modified after initialization. The table and its initialization flag can be safely shared across threads, as long as I make sure that only one thread initializes the table, and all other threads wait to read from the table until the initialization flag is set. I’m looking for help implementing this logic.

The crudest thing that comes to is to remove the threadprivate directive in the module, remove the initialization check from f, and explicitly fill the array in the main program like so:

module m
  implicit none

  public
  
  real, private :: table(100)
  logical, private :: table_init

contains

  subroutine init_table
    integer :: i

    table(1) = 1.0
    do i = 2, size(table)
      table(i) = real(i)*table(i-1)/real(128)
    end do
    table_init = .true.
  end subroutine

  real function f(n, x) 
    integer, intent(in) :: n
    real, intent(in) :: x

    ! assumes table is initialized
    f = table(n)*x**n
  end function 
end module m


program p
  use m
  implicit none

  !$OMP single
  call init_table
  !$OMP end single

  !$OMP barrier

  ! .... rest of the program
end program

That should work, I think, but it makes code maintenance more difficult. I’d really like to keep all the table management logic confined to the module. For reference, the function f will be called very frequently, so putting heavy thread synchronization logic in there is unappealing. I’m not super OpenMP-literate, have I overlooked some facility that would help here?

jeremie.vandenplas · December 15, 2020, 11:08pm

If I remember well, module variables are always shared in OpenMP (except when threadprivate is used of course). So, did you consider orphaned OpenMP directives? Something like:

module m
  implicit none

  public
  
  real, private :: table(100)
  logical, private :: table_init

contains

  subroutine init_table
    integer :: i

    !$omp single
    table(1) = 1.0
    do i = 2, size(table)
      table(i) = real(i)*table(i-1)/real(128)
    end do
    table_init = .true.
    !$omp end single         !!$omp single contains an implicit barrier

  end subroutine

  real function f(n, x) 
    integer, intent(in) :: n
    real, intent(in) :: x

    ! assumes table is initialized 
    !note that a check could be added. If not initialized, the first thread checking it would initialized it, whil the others are waiting at the implicit barrier.
    f = table(n)*x**n
  end function 
end module m


program p
  use m
  implicit none

  !$OMP parallel
  call init_table

  !loop that call f

  !$OMP end parallel

  ! .... rest of the program
end program

rcs · December 15, 2020, 11:51pm

Hi,

to achieve what you want you would have to remove the $thread private clause and you would need a lock or a critical section in function f. This will most likely kill the performance … and I assume that people going omp seeking performance.

In essence if you can afford, in terms of ram, to have x copies of table leave everything as is. Otherwise, as you already found out, you need to initialize the table at the beginning separately.

cheers

FortranFan · December 16, 2020, 5:17am

To the extent any engineer or scientist is pursuing programming with a compiler-based approach such as C++ or Fortran, especially Fortran given its legacy in Formula Translation, I always recommend to never lose sight of the underlying math and whether it can be described using constant expressions and to consequently pursue compile-time computing as much as possible.

Given the visionaries such as Stroustrup and Dos Reis constantly leading C++ forward, this is much easier and considerably more powerful presently with modern C++ than Fortran. But if one keeps seeking opportunities with Fortran and influences the language advancement with better facilities, more Fortran practitioners can benefit from the options.

So how is any of this relevant to the topic on hand? Well, it is to try to make the very question of shared data and whether it can thread-safe entirely moot, if possible. And an option to do so can be via named constants.
And note named constants really lend themselves well with MODULEs in Fortran.

In the case of the specific example in the original post, each entry, t_n, in the table is simply:

t_n = \frac{n!}{128^{n-1}}

which is a constant expression and thus can be encoded via an array named constant in Fortran. See the following code snippet that is spread over 4 lines mostly to make it easy on compilers, but also to make it easier for a future reader to follow the code. As to the former, the reason is the unfortunate situation where Fortran compilers do tend to struggle with constant expressions and throw ICEs, or worse (what can be worse than an ICE, right?), give wrong results:

      integer, parameter :: idx(*) = [( i, i = 1, n )]
      real(WP), parameter :: FAC = real(128, kind=WP)
      integer, parameter :: factorial(*) = [( product( idx(1:j) ), j = 1, n )]
      real(WP), parameter :: table(*) = [( real(factorial(i),kind=WP)/FAC**(i-1), i = 1, n )]

Of course I understand this may not always be the situation that program “data” can be characterized like so for thread-safe consumption. In such scenarios, one will have to pursue other techniques as described in this thread such as CRITICAL sections in legacy codes, or based on object-oriented (OO) design in modern codes. However, keeping an eye out for the named constant facility in Fortran is something I recommend.

Here’s a worked out example readers can try:

   integer, parameter :: WP = selected_real_kind( p=6 )
   integer, parameter :: n = 10
   integer :: i, j
   blk1: block
      integer, parameter :: idx(*) = [( i, i = 1, n )]
      real(WP), parameter :: FAC = real(128, kind=WP)
      integer, parameter :: factorial(*) = [( product( idx(1:j) ), j = 1, n )]
      real(WP), parameter :: table(*) = [( real(factorial(i),kind=WP)/FAC**(i-1), i = 1, n )]
      print *, "Block 1: array named constant"
      print *, "factorial up to ", n, ": ", factorial
      print *, "table: ", table
   end block blk1
   print *
   blk2: block
      real(WP) :: table(n)
      print *, "Block 2: using DO loop"
      call init_table( table )
      print *, "table: ", table
   end block blk2
contains
   subroutine init_table( t )
      real(WP), intent(inout) :: t(:)
      integer :: i
       t(1) = 1.0
       do i = 2, size(t)
         t(i) = real(i)*t(i-1)/real(128,kind=WP)
       end do
     end subroutine
end

Execution using gfortran:

C:\Temp>gfortran p.f90 -o p.exe

C:\Temp>p.exe
Block 1: array named constant
factorial up to 10 : 1 2 6 24 120 720 5040 40320 362880 3628800
table: 1.00000000 1.56250000E-02 3.66210938E-04 1.14440918E-05 4.47034836E-07 2.09547579E-08 1.14596332E-09 7.16227078E-11 5.03597164E-12 3.93435284E-13

Block 2: using DO loop
table: 1.00000000 1.56250000E-02 3.66210938E-04 1.14440918E-05 4.47034836E-07 2.09547579E-08 1.14596332E-09 7.16227078E-11 5.03597164E-12 3.93435284E-13

C:\Temp>

P.S.> Adapting the code for larger values of n and getting it to work with compilers’ of one’s choice is left up to the readers.

jeremie.vandenplas · December 16, 2020, 8:44am

@rcs, In the proposed example, why would you need a critical section (if intent(in) and intent(out) variables of f are private)?

rcs · December 16, 2020, 9:17am

Hi,

because, from my understanding, if the theadprivate clause on the module variables is omitted calling f can trigger, conditional on switch table_init, the table initialization, which should not be done by more than one thread. Using critical or a lock around

will ensure that only one thread at a time enters the region. If the entering thread finds the table not initialized it will initialize it and set the switch. subsequent threads will find the switch set.

However, I have to admit that I would have never coded it that way.

cheers

rcs · December 16, 2020, 10:25am

Beside the fair points fortranfan raised you may be able to achieve what you want without performance problems by using this:

module m
  implicit none

  public
  
  real, private :: table(100)
  logical, private :: table_init

contains

  subroutine init_table
    integer :: i

    table(1) = 1.0
    do i = 2, size(table)
      table(i) = real(i)*table(i-1)/real(128)
    end do
    table_init = .true.
  end subroutine

  real function f(n, x) 
    integer, intent(in) :: n
    real, intent(in) :: x
    if(table_init) then
      f = table(n)*x**n;return
    endif
    !$omp critical
    if(.not.table_init) call init_table
    !$omp end critical
   
  end function 
end module m

jeremie.vandenplas · December 16, 2020, 12:03pm

Could you not move the critical section insie init_table? So the critical section only penalizes at the first call when table_init is .false.? Also, would it not be easier to use a single directive (instead of critical)?

rcs · December 16, 2020, 12:13pm

The critical section in f avoids having another call to init each time calling f. Whether that makes a difference compared to having critical in init … I don’t know. To my knowledge critical is the only construct beside a lock which acts globally. Single will most likely not act globally which means if you have nested parallelization single will not be suitable to avoid race conditions if table is shared memory.

ivanpribec · December 16, 2020, 12:26pm

Wouldn’t this be avoided with an if-else block? For subsequent calls the else sub-block should be skipped.

everythingfunctional · December 17, 2020, 12:03am

If the $OMP critical is inside the subroutine, you have a potential race condition, where if not every thread gets inside before the table_init is set, then some threads will skip the call to the subroutine and never make it to the end of the barrier, leaving the threads that did deadlocked.

If the $OMP critical is outside the subroutine, you take the performance penalty every time the function is called.

Thus, the goals here are at odds. Your choices really are between thread private and a call to init in main. I would lean towards having the init in main, as you then also no longer take the penalty of checking the logical every time you call the function (although branch prediction probably already eliminates that penalty).

I did once see a way that C/C++ code could be executed before main. That might be an interesting thing to look into.

rcs · December 17, 2020, 12:18am

if init_table looks like this

subroutine init_table
   integer :: i
    if(table_init) return
   !$omp critical
   if(table_init) return
   table(1) = 1.0
   do i = 2, size(table)
     table(i) = real(i)*table(i-1)/real(128)
   end do
   table_init = .true.
   !$omp end critical
 end subroutine

there should be neither race conditions nor preformance issues beside an additional subroutine call.

nshaffer · December 17, 2020, 5:59pm

Thanks all for thinking about this problem, and for the suggestions. I’ve learned a lot!

It seems like @rcs’s most recent snippet is a good option if I’m dead-set on hiding the table management from the main program. It has the benefit that it is not too complicated, though I would probably want to rename it init_table_maybe to make it more honest. And a big thanks to @everythingfunctional for pointing out the deadlock danger if done in the “obvious” way. That’s a mistake I definitely would have made and wasted a lot of time on had the issue not been brought up.

@FortranFan’s suggestion to initialize the table at compile time is very tempting. I do need large indices (several hundred), so I’d have to be very careful in how I build the sequence (use log_gamma, take special care to avoid inexact values in intermediate steps, blahblahblah). For safer/simpler sequences, a module constant is bar-none the best, most obvious approach. @everythingfunctional’s mention of compile-time code execution (or simple pre-processing) is an option, but definitely overkill here. Maybe it’d make sense in another application, though.

A final consideration that leans me toward the init_table_maybe or main-program approaches is that this is a code that’s going to be read by unfamiliar and inexperienced eyes frequently (grad students, interns, postdocs). So keeping things simple or at least easily explained with a comment or two is a plus.

Topic		Replies	Views
Omp private object inside a derived type Help	4	576	July 9, 2020
Associate private behavior in OpenMP do loops? Language enhancement	13	932	July 21, 2023
OpenMP question: private vs shared work arrays for reduction Help	8	2371	February 23, 2021
Thread safety of modern Fortran libraries: what and how? Help	16	2476	January 25, 2022
Iso_c_binding: interface to a C function is not thread safe for openmp Help	2	661	May 19, 2021

Thread-safe initialization of shared module data (OpenMP)

Related topics