Replicable Monte Carlo simulation with OpenMP

program main
  integer :: n1, n2, i
  real(8) :: pi, x1, x2
  n1 = 1000*1000
  n2 = 0
  call random_init(.true., .true.)
  !$omp parallel do reduction(+:n2)
  do i = 1, n1
    call random_number(x1)
    call random_number(x2)
    if (x1**2 + x2**2 <= 1.0_8) n2 = n2 + 1
  end do
  pi = 4.0_8*dble(n2)/dble(n1)
  print *, pi
end program

The above program generates a different value of pi at each run. How can I ensure replicability in this case? GFortran 12.2.1 with flags -fimplicit-none -O3 -fopenmp is used.

See documentation for random_init here. It doesn’t do what you expect. For example, .... The sequence of random numbers is different for repeated execution of the program.

Another factor is that random_init is designed for coarray parallelism - which is not the same as openmp parallelism.

The program below shows one way to get repeatable results, but only if OMP_NUM_THREADS is fixed. However the approach to setting the seed might be unwise.

program main
  use omp_lib
  implicit none
  integer :: n1, n2, i, n
  integer, allocatable :: seed(:)
  real(8) :: pi, x1, x2

  n1 = 1000*1000

  ! Make space for a deterministic seed
  call random_seed(size=n)
  allocate(seed(n))

  !$omp parallel firstprivate(seed), private(x1, x2, i), shared(n1), reduction(+:n2)
  n2 = 0

  ! Deterministic seed, distinct on each thread.
  seed = [ (i, i=0, size(seed)-1) ] + omp_get_thread_num()
  call random_seed(put=seed)

  !$omp do
  do i = 1, n1
    call random_number(x1)
    call random_number(x2)
    if (x1**2 + x2**2 <= 1.0_8) n2 = n2 + 1
  end do
  !$omp end do

  !$omp end parallel
  pi = 4.0_8*dble(n2)/dble(n1)
  print *, pi
end program

While this seems to work, I don’t know if we can be confident that these seeds will produce effectively independent random numbers. That probably depends on the random number generator.

1 Like

Another issue in your code is the race condition on x1 and x2: they have to be private (as in the code proposed by @gareth )

I would initialize the seeds with deterministic random numbers as well…

program main
  use omp_lib
  implicit none
  integer :: n1, n2, i, n
  integer, allocatable :: seed0(:), seed(:,:)
  real(8), allocatable :: fseed(:,:)
  real(8) :: pi, x1, x2

  n1 = 1000*1000

  call random_seed0(size=n)
  seed0 = [ (i, i=0, n-1) ]
  call random_seed(put=seed0)

  n2 = 0
  !$omp parallel private(x1, x2, i) reduction(+:n2)

  !$omp single
  allocate( fseed(n,omp_get_num_threads()) )
  call random_number(fseed)
  seed = floor(fseed*huge(seed))
  !$omp end single

  ! Deterministic seed, distinct on each thread.
  call random_seed(put=seed(:,omp_get_thread_num()))

  !$omp do
  do i = 1, n1
    call random_number(x1)
    call random_number(x2)
    if (x1**2 + x2**2 <= 1.0_8) n2 = n2 + 1
  end do
  !$omp end do

  !$omp end parallel

  pi = 4.0_8*dble(n2)/dble(n1)
  print *, pi
end program

That said, are random_seed() and random_number() garanteed to be thread-safe ?

Here is what GFortran documentation says:

Note that in a multi-threaded program (e.g. using OpenMP directives), each thread will have its own random number state.

Does that mean GFortran’s RNG is thread-safe?

But it’s probably processor-dependent. Which means that this kind of code is not portable.

This is what I was thinking too. Coarray parallelism is in the fortran standard, but this is outside the bounds of the standard, so its behavior probably depends on what OpenMP says about this issue and how well OpenMP is incorporated into the specific compiler.

Regarding whether different seeds produce sequences that overlap, this is one of the situations that the gfortran PRNG handles very well. The xoshiro256** algorithm has the ability to skip forward an arbitrary number of steps in the sequence. So if a reproducible sequence is initialized on one thread, then the other threads can all skip forward by SKIP*omp_get_thread_num(), where SKIP is some suitably large integer value, and the sequences on the different threads are guaranteed to be independent and to not overlap. This approach would of course be limited to the fortran compilers that use that algorithm (or a similar one with the same skip-ahead feature) or to the cases where a user-written PRNG code uses that kind of algorithm. I personally think that algorithm should be adopted by the fortran standard so that, among other reasons, this feature would be portable.

Do you know if we can make use of this parallel random number feature in gfortran, with deterministic results?

The program below tries to do this by setting the master seed, based on my interpretation of the documentation. But something is wrong - it doesn’t lead to deterministic results.

Below I run it 10 times and get different answers (all using 12 threads). This was with gfortran 11.3.0 on ubuntu 22.04.

$ gfortran -fopenmp random_numbers2.f90 -o random_numbers2
$ for i in {1..10}; do OMP_NUM_THREADS=12 ./random_numbers2; done
   3.1405040000000000     
   3.1404999999999998     
   3.1405040000000000     
   3.1405040000000000     
   3.1405040000000000     
   3.1405040000000000     
   3.1405040000000000     
   3.1404999999999998     
   3.1404999999999998     
   3.1404999999999998     

The program random_numbers2.f90 is here:

program main
  use omp_lib
  implicit none
  integer :: n1, n2, i, n, nt
  integer, allocatable :: seed(:)
  real, allocatable :: threadval(:)
  real(8) :: pi, x1, x2

  n1 = 1000*1000
  !$omp parallel
  nt = omp_get_num_threads()
  !$omp end parallel
  allocate(threadval(nt))

  ! Make deterministic seed
  call random_seed(size=n)
  allocate(seed(n))
  seed = [ (i, i=0, size(seed)-1) ]
  call random_seed(put = seed)

  ! call random_number(x1)


  !$omp parallel private(x1, x2, i), shared(n1, threadval), reduction(+:n2)
  n2 = 0

  ! ! Attempt to trigger random_seed update on threads other than master, doesn't lead to deterministic results.
  ! !$omp critical
  ! if(omp_get_thread_num() /= 0) call random_number(x1)
  ! !$omp end critical
  ! !$omp barrier

  !$omp do
  do i = 1, n1
    call random_number(x1)
    call random_number(x2)
    if (x1**2 + x2**2 <= 1.0_8) n2 = n2 + 1
  end do
  !$omp end do

  ! Debugging: Store the final random number on each thread
  threadval(omp_get_thread_num()+1) = x2

  !$omp end parallel
  pi = 4.0_8*dble(n2)/dble(n1)
  print *, pi

  ! !Debugging: Check that threads are all distinct
  ! print*, 'Final random numbers: '
  ! do i = 1, size(threadval)
  !     print*, threadval(i)
  ! end do
end program

On the broader topic of parallel random numbers in Fortran – the Prng class in the coretran library looks nice.

The coretran library is here.

Try adding a call to random_init before setting the seed. Also, does each thread get it’s own copy of the RNG? My guess is no, as it can’t know ahead of time how many threads there will be, but it does when using images (coarrays). In that case you have a race condition as to which thread will receive the next random number, even if there are locks in safe to make it “thread safe”.

Try adding a call to random_init before setting the seed.

This didn’t work for me, which seems expected. Even ignoring issues of openmp vs coarrays, notice gfortran’s random_init does not produce reproducible random numbers between repeated runs of the same program (documentation here).

Also, does each thread get it’s own copy of the RNG?

For gfortran the answer is yes according to this documentation. This is an extension, not part of the Fortran standard. It sounds like we can just call random_seed on the master thread, and gfortran will take care of providing unique seeds on each thread.

The question is whether we can get gfortran to do this in a reproducible way - even if only with a fixed number of seeds. For now I can’t see how to.

Of course the best solution is just to use an external library, so we are not reliant on non-standard compiler extensions.

Thanks for pointing that out.

I did use gfortran, but only tried random_init in conjunction with openmp.

Given that, I wonder if the gfortran documentation for the repeatable argument in random_init needs updating?

If it is .true., the seed is set to a processor-dependent value that is the same each time RANDOM_INIT is called from the same image. The term “same image” means a single instance of program execution. The sequence of random numbers is different for repeated execution of the program. If it is .false., the seed is set to a processor-dependent value."

That’s interesting. FWIW I seem to get repeatable random numbers in serial, using a recent install of ubuntu 22.04 with gfortran 11.3.0.

$ cat random_numbers3.f90 
program main
  implicit none
  real :: x(4)
  call random_init(.true., .true.)
  call random_number(x)
  print*, x
end program
$ gfortran random_numbers3.f90 -o random_numbers3
$ for i in {1..100}; do ./random_numbers3 ; done
  0.825262189      0.191325366      0.155503273      0.985376239    
  0.825262189      0.191325366      0.155503273      0.985376239    
  0.825262189      0.191325366      0.155503273      0.985376239    
... (remains constant) ...

For a Monte-Carlo simulation, isn’t a set of variable results a good thing.

This could require a different random number generator for each thread, although to demonstrate a thread-safe RNG is a good outcome.

Perhaps you could do some basic stats tests for RNG values in each thread ?
Possible values could be
avg,
stdev and
consecetive values are independent ( would stdev (Xi-Xi-1) = 0 pass?)

Interestingly the code @kargl posted above is not reproducible on my system.

I called it random_numbers4.f90, and get two different values of pi in the 10 runs below.

$ gfortran -fopenmp random_numbers4.f90 -o random_numbers4
$ for i in {1..10}; do ./random_numbers4; done
pi =  3.141968000000, nt = 12
pi =  3.141968000000, nt = 12
pi =  3.141968000000, nt = 12
pi =  3.141968000000, nt = 12
pi =  3.141964000000, nt = 12
pi =  3.141964000000, nt = 12
pi =  3.141964000000, nt = 12
pi =  3.141968000000, nt = 12
pi =  3.141964000000, nt = 12
pi =  3.141968000000, nt = 12

$ gfortran --version
GNU Fortran (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

But it becomes reproducible if I insert the following at the start of the openmp region, before the loop.

if(omp_get_thread_num() /= 0) call random_number(x1)
!$omp barrier

Is it possible that the master thread’s seed is sometimes updated by random number generation, before all other threads generate a random number (and thus determine their seed)? The above code is designed to prevent that.

Full code below

program main
   use omp_lib
   implicit none
   logical verbose
   integer :: n1, n2, i, n, nt
   integer, allocatable :: seed(:)
   real, allocatable :: threadval(:)
   real(8) :: pi, x1, x2

   verbose = .true.
   if (command_argument_count() /= 0) verbose = .false.
 
   ! Initialize the master state to a repeatable sequence.
   call random_init(.true., .false.)

   n1 = 1000*1000

   !$omp parallel
   nt = omp_get_num_threads()
   !$omp end parallel

   allocate(threadval(nt))

   !$omp parallel private(x1, x2, i), shared(n1, threadval), reduction(+:n2)
   n2 = 0

   if(omp_get_thread_num() /= 0) call random_number(x1)
   !$omp barrier

   !$omp do
   do i = 1, n1
      call random_number(x1)
      call random_number(x2)
      if (x1**2 + x2**2 <= 1.0_8) n2 = n2 + 1
   end do
   !$omp end do

   threadval(omp_get_thread_num()+1) = x2  ! Store final RN on each thread

   !$omp end parallel

   pi = 4.0_8*dble(n2)/dble(n1)
   if (verbose) print '(A,F15.12,A,I0)', 'pi = ', pi, ', nt = ', nt

   !Check that threads are all distinct
   if (.not. verbose) print '(A,*(F9.6))', 'RN:', threadval
end program

EDIT – actually the above program is still not totally repeatable. I can get different values of threadval on repeated runs. But pi seems repeatable in any case (over hundreds of test runs).

See bug report here