Thread-safe initialization of shared module data (OpenMP)

I’ve inherited an OpenMP-threaded codebase that has a module looking something like this

module m
  implicit none

  public
  
  real, private :: table(100)
  logical, private :: table_init
  !$OMP threadprivate(table, table_init)

contains

  subroutine init_table
    integer :: i

    table(1) = 1.0
    do i = 2, size(table)
      table(i) = real(i)*table(i-1)/real(128)
    end do
    table_init = .true.
  end subroutine

  real function f(n, x) 
    integer, intent(in) :: n
    real, intent(in) :: x

    if (.not. table_init)  call init_table

    f = table(n)*x**n
  end function 
end module m

That is, there is a function which uses a lookup table stored as a module variable, along with a flag to keep track of whether or not it’s been initialized. Thread safety is maintained by giving each thread a private copy of the lookup table.

However, in this example as well as in the real code, the table is not modified after initialization. The table and its initialization flag can be safely shared across threads, as long as I make sure that only one thread initializes the table, and all other threads wait to read from the table until the initialization flag is set. I’m looking for help implementing this logic.

The crudest thing that comes to is to remove the threadprivate directive in the module, remove the initialization check from f, and explicitly fill the array in the main program like so:

module m
  implicit none

  public
  
  real, private :: table(100)
  logical, private :: table_init

contains

  subroutine init_table
    integer :: i

    table(1) = 1.0
    do i = 2, size(table)
      table(i) = real(i)*table(i-1)/real(128)
    end do
    table_init = .true.
  end subroutine

  real function f(n, x) 
    integer, intent(in) :: n
    real, intent(in) :: x

    ! assumes table is initialized
    f = table(n)*x**n
  end function 
end module m


program p
  use m
  implicit none

  !$OMP single
  call init_table
  !$OMP end single

  !$OMP barrier

  ! .... rest of the program
end program

That should work, I think, but it makes code maintenance more difficult. I’d really like to keep all the table management logic confined to the module. For reference, the function f will be called very frequently, so putting heavy thread synchronization logic in there is unappealing. I’m not super OpenMP-literate, have I overlooked some facility that would help here?

2 Likes

If I remember well, module variables are always shared in OpenMP (except when threadprivate is used of course). So, did you consider orphaned OpenMP directives? Something like:

module m
  implicit none

  public
  
  real, private :: table(100)
  logical, private :: table_init

contains

  subroutine init_table
    integer :: i

    !$omp single
    table(1) = 1.0
    do i = 2, size(table)
      table(i) = real(i)*table(i-1)/real(128)
    end do
    table_init = .true.
    !$omp end single         !!$omp single contains an implicit barrier

  end subroutine

  real function f(n, x) 
    integer, intent(in) :: n
    real, intent(in) :: x

    ! assumes table is initialized 
    !note that a check could be added. If not initialized, the first thread checking it would initialized it, whil the others are waiting at the implicit barrier.
    f = table(n)*x**n
  end function 
end module m


program p
  use m
  implicit none

  !$OMP parallel
  call init_table

  !loop that call f

  !$OMP end parallel

  ! .... rest of the program
end program

Hi,

to achieve what you want you would have to remove the $thread private clause and you would need a lock or a critical section in function f. This will most likely kill the performance … and I assume that people going omp seeking performance.

In essence if you can afford, in terms of ram, to have x copies of table leave everything as is. Otherwise, as you already found out, you need to initialize the table at the beginning separately.

cheers

To the extent any engineer or scientist is pursuing programming with a compiler-based approach such as C++ or Fortran, especially Fortran given its legacy in Formula Translation, I always recommend to never lose sight of the underlying math and whether it can be described using constant expressions and to consequently pursue compile-time computing as much as possible.

Given the visionaries such as Stroustrup and Dos Reis constantly leading C++ forward, this is much easier and considerably more powerful presently with modern C++ than Fortran. But if one keeps seeking opportunities with Fortran and influences the language advancement with better facilities, more Fortran practitioners can benefit from the options.

So how is any of this relevant to the topic on hand? Well, it is to try to make the very question of shared data and whether it can thread-safe entirely moot, if possible. And an option to do so can be via named constants.
And note named constants really lend themselves well with MODULEs in Fortran.

In the case of the specific example in the original post, each entry, t_n, in the table is simply:

t_n = \frac{n!}{128^{n-1}}

which is a constant expression and thus can be encoded via an array named constant in Fortran. See the following code snippet that is spread over 4 lines mostly to make it easy on compilers, but also to make it easier for a future reader to follow the code. As to the former, the reason is the unfortunate situation where Fortran compilers do tend to struggle with constant expressions and throw ICEs, or worse (what can be worse than an ICE, right?), give wrong results:

      integer, parameter :: idx(*) = [( i, i = 1, n )]
      real(WP), parameter :: FAC = real(128, kind=WP)
      integer, parameter :: factorial(*) = [( product( idx(1:j) ), j = 1, n )]
      real(WP), parameter :: table(*) = [( real(factorial(i),kind=WP)/FAC**(i-1), i = 1, n )]

Of course I understand this may not always be the situation that program “data” can be characterized like so for thread-safe consumption. In such scenarios, one will have to pursue other techniques as described in this thread such as CRITICAL sections in legacy codes, or based on object-oriented (OO) design in modern codes. However, keeping an eye out for the named constant facility in Fortran is something I recommend.

Here’s a worked out example readers can try:

   integer, parameter :: WP = selected_real_kind( p=6 )
   integer, parameter :: n = 10
   integer :: i, j
   blk1: block
      integer, parameter :: idx(*) = [( i, i = 1, n )]
      real(WP), parameter :: FAC = real(128, kind=WP)
      integer, parameter :: factorial(*) = [( product( idx(1:j) ), j = 1, n )]
      real(WP), parameter :: table(*) = [( real(factorial(i),kind=WP)/FAC**(i-1), i = 1, n )]
      print *, "Block 1: array named constant"
      print *, "factorial up to ", n, ": ", factorial
      print *, "table: ", table
   end block blk1
   print *
   blk2: block
      real(WP) :: table(n)
      print *, "Block 2: using DO loop"
      call init_table( table )
      print *, "table: ", table
   end block blk2
contains
   subroutine init_table( t )
      real(WP), intent(inout) :: t(:)
      integer :: i
       t(1) = 1.0
       do i = 2, size(t)
         t(i) = real(i)*t(i-1)/real(128,kind=WP)
       end do
     end subroutine
end

Execution using gfortran:

C:\Temp>gfortran p.f90 -o p.exe

C:\Temp>p.exe
Block 1: array named constant
factorial up to 10 : 1 2 6 24 120 720 5040 40320 362880 3628800
table: 1.00000000 1.56250000E-02 3.66210938E-04 1.14440918E-05 4.47034836E-07 2.09547579E-08 1.14596332E-09 7.16227078E-11 5.03597164E-12 3.93435284E-13

Block 2: using DO loop
table: 1.00000000 1.56250000E-02 3.66210938E-04 1.14440918E-05 4.47034836E-07 2.09547579E-08 1.14596332E-09 7.16227078E-11 5.03597164E-12 3.93435284E-13

C:\Temp>

P.S.> Adapting the code for larger values of n and getting it to work with compilers’ of one’s choice is left up to the readers.

4 Likes

@rcs, In the proposed example, why would you need a critical section (if intent(in) and intent(out) variables of f are private)?

Hi,

because, from my understanding, if the theadprivate clause on the module variables is omitted calling f can trigger, conditional on switch table_init, the table initialization, which should not be done by more than one thread. Using critical or a lock around

will ensure that only one thread at a time enters the region. If the entering thread finds the table not initialized it will initialize it and set the switch. subsequent threads will find the switch set.

However, I have to admit that I would have never coded it that way.

cheers

Beside the fair points fortranfan raised you may be able to achieve what you want without performance problems by using this:

module m
  implicit none

  public
  
  real, private :: table(100)
  logical, private :: table_init

contains

  subroutine init_table
    integer :: i

    table(1) = 1.0
    do i = 2, size(table)
      table(i) = real(i)*table(i-1)/real(128)
    end do
    table_init = .true.
  end subroutine

  real function f(n, x) 
    integer, intent(in) :: n
    real, intent(in) :: x
    if(table_init) then
      f = table(n)*x**n;return
    endif
    !$omp critical
    if(.not.table_init) call init_table
    !$omp end critical
   
  end function 
end module m

Could you not move the critical section insie init_table? So the critical section only penalizes at the first call when table_init is .false.? Also, would it not be easier to use a single directive (instead of critical)?

The critical section in f avoids having another call to init each time calling f. Whether that makes a difference compared to having critical in init … I don’t know. To my knowledge critical is the only construct beside a lock which acts globally. Single will most likely not act globally which means if you have nested parallelization single will not be suitable to avoid race conditions if table is shared memory.

1 Like

Wouldn’t this be avoided with an if-else block? For subsequent calls the else sub-block should be skipped.

1 Like

If the $OMP critical is inside the subroutine, you have a potential race condition, where if not every thread gets inside before the table_init is set, then some threads will skip the call to the subroutine and never make it to the end of the barrier, leaving the threads that did deadlocked.

If the $OMP critical is outside the subroutine, you take the performance penalty every time the function is called.

Thus, the goals here are at odds. Your choices really are between thread private and a call to init in main. I would lean towards having the init in main, as you then also no longer take the penalty of checking the logical every time you call the function (although branch prediction probably already eliminates that penalty).

I did once see a way that C/C++ code could be executed before main. That might be an interesting thing to look into.

2 Likes

if init_table looks like this

subroutine init_table
   integer :: i
    if(table_init) return
   !$omp critical
   if(table_init) return
   table(1) = 1.0
   do i = 2, size(table)
     table(i) = real(i)*table(i-1)/real(128)
   end do
   table_init = .true.
   !$omp end critical
 end subroutine

there should be neither race conditions nor preformance issues beside an additional subroutine call.

2 Likes

Thanks all for thinking about this problem, and for the suggestions. I’ve learned a lot!

It seems like @rcs’s most recent snippet is a good option if I’m dead-set on hiding the table management from the main program. It has the benefit that it is not too complicated, though I would probably want to rename it init_table_maybe to make it more honest. And a big thanks to @everythingfunctional for pointing out the deadlock danger if done in the “obvious” way. That’s a mistake I definitely would have made and wasted a lot of time on had the issue not been brought up.

@FortranFan’s suggestion to initialize the table at compile time is very tempting. I do need large indices (several hundred), so I’d have to be very careful in how I build the sequence (use log_gamma, take special care to avoid inexact values in intermediate steps, blahblahblah). For safer/simpler sequences, a module constant is bar-none the best, most obvious approach. @everythingfunctional’s mention of compile-time code execution (or simple pre-processing) is an option, but definitely overkill here. Maybe it’d make sense in another application, though.

A final consideration that leans me toward the init_table_maybe or main-program approaches is that this is a code that’s going to be read by unfamiliar and inexperienced eyes frequently (grad students, interns, postdocs). So keeping things simple or at least easily explained with a comment or two is a plus.