Odd behaviour when wrapping C++ parallel RNG library

I’m looking for help with a very peculiar problem I’ve gotten stuck on. I’m doing some numerical simulations for my thesis and am using OpenMP to speed up the calculations. Because of my specific background/use-case, the most straightforward way to get RNG functionality is to use a C++ parallel RNG library (the front-end to these simulations is in R and such libraries are easily distributed by CRAN). Specifically, I’m using this one.

I’ve written wrappers with other C++ code before, but this time I can’t get it to function properly. After quite some testing, I think I’ve narrowed down the problem to the following: this library instantiates a separate state variable for every thread and updates this every time a sample is drawn. However, it seems that any given thread sometimes draws a new sample before this state is updated. This does not occur when doing the entire simulation within C++, and different threads do not produce duplicates. The problem is strictly confined to a single thread when called from Fortran. Has anyone encountered such a problem before? Any help would be greatly appreciated.

Hello and welcome!

It is difficult to analyze the problem without knowing the actual code,

So it would be helpful if you could try to reduce the problem and provide a minimal case.

Still, I have some ideas (I don’t know, whether they are helpful):

  • Could there be a problem with the scope (e.g., PRIVATE, SHARED) of the variables in the OpenMPized loops.

  • Can the functionality be called from pure C (not C++)?While there exists a standardized interoperability between Fortran and C, this is not the case for the advanced C++ features like objects templates, exceptions or overloaded functions.

Hi, thank you kindly for your response. I have tried some more things in light of your comments, and I’m starting to think that I’m dealing with multiple problems. Let me first provide a minimally reduced version of the code. In C++ I have the wrappers

extern "C" {
  void* init_rng(int n_threads, int seed);

  double runif_par(void* rng_ptr, int i_thread);

void* init_rng(int n_threads, int seed) {
  using rng_state_type = dust::random::generator<double>;
  void* rng_ptr = (void*) new dust::random::prng<rng_state_type>(n_threads, seed);

double runif_par(void* rng_ptr, int i_thread) {
  using rng_state_type = dust::random::generator<double>;
  using prng = dust::random::prng<rng_state_type>;
  prng* rng_ptr_ = static_cast<prng*>(rng_ptr);
  auto& state = rng_ptr_->state(i_thread);
  double sample = dust::random::random_real<double>(state);

I put the Fortran interface in a module which I include in the main business code:

module interface

   use, intrinsic :: iso_c_binding

   implicit none

      function init_rng(n_threads, seed) result(res) bind(c)
         integer(c_int), value :: seed
         integer(c_int), value :: n_threads
         type(c_ptr) :: res
      end function init_rng

      function runif_par(rng, i_thread) result(sample) bind(c)
         type(c_ptr), value :: rng
         integer(c_int), value :: i_thread
         real(c_double) :: sample
      end function runif_par

Now the main loop does something like this:

      n_threads = init_omp() ! Some setup to determine number of threads to use.
      rng = init_rng(n_threads, 42)  ! Second argument is seed.

      !$omp parallel do num_threads(n_threads) private(array) schedule(dynamic, 25)
      do i_sim = 1, n_config
         i_thread = omp_get_thread_num()
         array = 0
         call internal_subroutine(array)
     end do

Here internal_subroutine does some the heavy lifting involving calculations which call the RNG functions. I’ve also written a test routine which simply generates uniform samples on each thread and checks whether there are any duplicates (i.e. whether the random number generation works well).

Now, what I’ve found since posting the original question is that the RNG works if I add an additional step which manually updates the internal state. To be precise, the state variable in runif_par is updated inside the C++ library code by calling next each time a draw is made. If I manually add a call to next inside the wrapper, then it works without problem. So it seems that there is some lag which causes the next call from the Fortran to access the state before it’s updated (which is very odd).

Something else which came up and might be related: if I initialise the array in the parallel OMP loop to 0 and print out its values after the call to internal_subroutine, then it sometimes happens that I get 0 (which should never happen since internal_subroutine raises an exception in that case). I’m beginning to think it’s illegal to dynamically allocate memory inside a parallel do, could this be the problem?

Unfortunately, I am not sure.

Could you try what happens if you replace array by a 2D array with the dimension (n, n_threads)
that is allocated outside of the loop and instead of accessing array(n), you access array(:, i_thread)
within the loop? I assume n is defined somewhere outside of the code shown here.

The following code was able to compile fine (and give plausible results) for both gfortran and ifort:

 program private_checker
  implicit none
  real, dimension(:), allocatable:: array
  integer:: counter
  !$omp parallel do private(array) schedule(dynamic)
  do counter=1,100
     call random_number(array)
     write(*,*) counter, sum(array)
  end do
end program private_checker

I know that you are do not use the default RNG, this was just for demonstration purposes.
I remember that the RNG included in early versions of gfortran was not thread-save (now it is). So maybe
it is useful to check again, whether there could be problems with parallelization like global variables.
According to your comment seems not be the case for C++, but maybe the mechanisms that prevent these problems do not work properly when mixing the languages (the last two sentences are only speculation).