Sobol low-discrepancy sequence

Beliavsky · June 10, 2023, 11:21pm

The Sobol sequence is one kind of low-discrepancy sequence that could be used in some applications, such as Monte Carlo integration or global optimization, instead of pseudo-random numbers such as those produced by the random_number() intrinsic. Do people here use low-discrepancy sequences in simulations? Experimenting with John Bukardt’s code at GitHub - Beliavsky/Sobol_sequence: experiment with the quasi-random Sobol sequence , I have found that in one dimension, a Sobol sequence does fill space much more uniformly than random numbers do, but that values of the Sobol sequence have autocorrelations that are large in magnitude, so it would simulate a random walk poorly.

Below are sample results. The counts shown are the number of variates betwen 0 and 0.1, 0.1 and 0.2, etc.

 #obs =              1000000

random number type: Sobol  
      mean        sd    dev_sq    counts
  0.499999  0.288675        12    100002     99999    100000    100000     99999    100002     99999    100000    100000     99999
  0.500000  0.288675         8     99999    100001    100001     99999    100000     99999    100001    100001     99999    100000
  0.500001  0.288675        16     99999     99999    100001     99999    100002     99999     99999    100001     99999    100002
  0.499999  0.288675        48    100001    100001     99998    100003     99997    100001    100001     99998    100003     99997
  0.500000  0.288675        12    100000     99999    100002     99999    100000    100000     99999    100002     99999    100000

       lag       ACF       ACF       ACF       ACF       ACF
         1    0.1429    0.1429    0.1429    0.1429    0.1429
         2   -0.7143   -0.7143   -0.7143   -0.7143   -0.7143
         3   -0.0714   -0.0714   -0.0714   -0.0714   -0.0714
         4    0.5714    0.5714    0.5714    0.5714    0.5714
         5   -0.0982   -0.0982   -0.0982   -0.0982   -0.0982
         6   -0.7679   -0.7679   -0.7679   -0.7679   -0.7679
         7    0.0625    0.0625    0.0625    0.0625    0.0625
         8    0.8929    0.8929    0.8929    0.8929    0.8929
         9    0.0592    0.0592    0.0592    0.0592    0.0592
        10   -0.7746   -0.7746   -0.7746   -0.7746   -0.7746

random number type: uniform
      mean        sd    dev_sq    counts
  0.500188  0.288813    906656    100154    100141     99694     99777     99840     99932     99732    100031    100781     99918
  0.500089  0.288584    252714     99770     99995    100059    100215     99961     99951     99952     99946    100330     99821
  0.500404  0.288698    811622     99895     99952     99338    100387    100034     99875     99908    100027    100209    100375
  0.499800  0.288518   1140964     99945     99812    100402    100238     99476    100601    100123    100024     99548     99831
  0.500220  0.288554   2050940     99686     99445    100163    101002     99760     99578     99826    100416    100351     99773

       lag       ACF       ACF       ACF       ACF       ACF
         1   -0.0007   -0.0012   -0.0001   -0.0017   -0.0003
         2    0.0010    0.0005   -0.0004    0.0009    0.0008
         3   -0.0008   -0.0012   -0.0008   -0.0002   -0.0007
         4    0.0004   -0.0008    0.0005   -0.0005    0.0000
         5    0.0013    0.0012   -0.0011   -0.0007   -0.0030
         6    0.0009   -0.0002   -0.0012    0.0015    0.0007
         7   -0.0010    0.0015    0.0016   -0.0009   -0.0001
         8    0.0003    0.0003   -0.0005   -0.0005    0.0013
         9    0.0001    0.0005   -0.0009   -0.0013   -0.0014
        10    0.0009   -0.0002   -0.0010    0.0003   -0.0004

Arjen · June 11, 2023, 7:24am

I have used such a sequence once as an experiment to see whether it would accelerate the computations in one of our programs. Unfortunately, there was no obvious performance gain. The theoretical advantages are clear, though. The sequence was based on The Unreasonable Effectiveness of Quasirandom Sequences | Extreme Learning. I would not mind looking into the technique again .

vmagnin · June 11, 2023, 9:22am

I tried it around 2016-2017 in a ray-tracing model to accelerate the simulation of reflections on the pyramidal textures used on the surface of solar cells (to limit reflections). I used the Sobol algorithm available in Numerical Recipes in Fortran 90. I have also tested the simple Weyl sequence.

Finally, I used the Halton sequence instead. I don’t remember why but I could search in my notes…

Using such a quasi-random generator, I could obtain the same results with 10x less rays.

https://doi.org/10.1364/oe.20.009692

ivanpribec · June 11, 2023, 7:40pm

From work I’ve learned about a Python package called EasyVVUQ that supports several kinds of sampling methods. For more details, have a look in the open-access article about it. EasyVVUQ builds on top of a library called Chaospy (Chaospy: An open source tool for designing methods of uncertainty quantification - ScienceDirect) published by researchers from Simula Research Laboratory in Oslo, Norway.

Beliavsky · June 12, 2023, 12:55am

For uniform variates on (0, 1) the mean is 0.5 and the mean absolute deviation from 0.5 is 0.25. Given uniform variate x1 one can construct variates x2, x3, x4 so that for the 4 variates the mean and mean absolute deviation from 0.5 match the theoretical values. For integrals from 0 to 1 of a few functions (x^2, exp(x), log(1+x)) using groups of 4 variates constructed this way gave much more accurate results than 4 independent uniform variates and was also faster (since generating random variates has a cost). For the code

module mc_func_mod
implicit none
private
public :: dp, ifunc, f, f_integral, f_name
integer, parameter :: dp = selected_real_kind(15, 307), ifunc = 1
contains
elemental function f(x) 
! function to integrate
real(kind=dp), intent(in) :: x
real(kind=dp)             :: f
select case (ifunc)
   case (1) ; f = x**2
   case (2) ; f = exp(x)
   case (3) ; f = log(1+x)
end select
end function f
!
elemental function f_integral(jfunc)
! value of integral from 0 to 1 of f
integer, intent(in) :: jfunc
real(kind=dp)       :: f_integral
select case (jfunc)
   case (1) ; f_integral = 1/3.0_dp
   case (2) ; f_integral = exp(1.0_dp) - 1
   case (3) ; f_integral = 2.0_dp * log(2.0_dp) - 1
end select
end function f_integral
!
elemental function f_name(jfunc)
! name of function
integer, intent(in) :: jfunc
character (len=10)  :: f_name
select case (jfunc)
   case (1) ; f_name = "x^2"
   case (2) ; f_name = "exp(x)"
   case (3) ; f_name = "log(1+x)"
end select
end function f_name
end module mc_func_mod
!
program monte_carlo
use iso_fortran_env, only: compiler_version, compiler_options
use mc_func_mod, only: dp, f, ifunc, f_integral, f_name
implicit none
integer, parameter :: nsim = 100, nran = (10**6)/4, i64=selected_int_kind(15)
integer  :: i, j, imethod
integer(kind=i64) :: itick, t1, t2
real(dp) :: estimates(nsim)
real(dp) :: x1, x2, x3, x4, fsum, std_dev, mean_estimates, sum_sq_diff, start_time, end_time
character (len=10) :: methods(2) = ["ordinary  ", "antithetic"]
print*,"compiler version: ",trim(compiler_version())
print*,"compiler options: ",trim(compiler_options())
call system_clock(count_rate=itick)
print*,"#random variates: ", nran*4 
print*,"ifunc: ",ifunc
print*,"function: ",trim(f_name(ifunc))
do_method: do imethod = 1,2
   print "(/,a)", "method: " // trim(methods(imethod))
   call system_clock(t1)
   call cpu_time(start_time)
   do i = 1, nsim
      fsum = 0.0_dp
      do j = 1, nran
         if (imethod == 1) then ! 4 separate uniform variates        
            call random_number(x1)
            call random_number(x2)
            call random_number(x3)
            call random_number(x4)
         else                   ! 1 uniform variate and 3 related variates
            call random_number(x1)
            if (x1 < 0.5_dp) then 
               x2 = 0.5_dp - x1 ! set x2 so that average of x1 and x2 is 0.25
             else
               x2 = 1.5_dp - x1 ! set x2 so that average of x1 and x2 is 0.75
            end if
            ! reflect x1 and x2 across 0.5 to get x3 and x4
            x3 = 1 - x1
            x4 = 1 - x2           
         end if
         fsum = fsum + f(x1) + f(x2) + f(x3) + f(x4)
      end do
      estimates(i) = fsum / (4 * real(nran, dp))
   end do
   mean_estimates = sum(estimates) / nsim
   sum_sq_diff = sum((estimates - mean_estimates)**2)
   std_dev = sqrt(sum_sq_diff / real(nsim, dp))
   call cpu_time(end_time)
   call system_clock(t2)
   print*, "Estimated value: ", mean_estimates
   print*, "Absolute error: ", abs(mean_estimates - f_integral(ifunc))
   print*, "Standard error: ", std_dev/sqrt(real(nsim, kind=dp))
   print*, "Execution time: ", end_time - start_time, " seconds"
   print*, "Wall time:      ", (t2-t1)/real(itick, kind=dp), " seconds"
end do do_method
end program monte_carlo

I got results with gfortran -O3 -march=native -flto with ifunc=1 of

 compiler version: GCC version 14.0.0 20230604 (experimental)
 compiler options: -march=skylake -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mavx -mavx2 -mno-sse4a -mno-fma4 -mno-xop -mfma -mno-avx512f -mbmi -mbmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -madx -mabm -mno-cldemote -mclflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mf16c -mfsgsbase -mfxsr -mno-hle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mno-rtm -mno-serialize -mno-sgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 -mno-avxifma -mno-avxvnniint8 -mno-avxneconvert -mno-cmpccxadd -mno-amx-fp16 -mno-prefetchi -mno-raoint -mno-amx-complex --param=l1-cache-size=32 --param=l1-cache-line-size=64 --param=l2-cache-size=8192 -mtune=skylake -O3 -flto
 #random variates:      1000000
 ifunc:            1
 function: x^2

method: ordinary
 Estimated value:   0.33337875894428776     
 Absolute error:    4.5425610954441531E-005
 Standard error:    2.8270602516907821E-005
 Execution time:    1.2968750000000000       seconds
 Wall time:         1.3189625000000000       seconds

method: antithetic
 Estimated value:   0.33333553788588632     
 Absolute error:    2.2045525530089982E-006
 Standard error:    3.6637637330178165E-006
 Execution time:   0.32812500000000000       seconds
 Wall time:        0.33898689999999998       seconds

and for ifunc=2

 #random variates:      1000000
 ifunc:            2
 function: exp(x)

method: ordinary
 Estimated value:    1.7182745325322504     
 Absolute error:    7.2959267947148021E-006
 Standard error:    4.8536724073670133E-005
 Execution time:    5.9843750000000000       seconds
 Wall time:         5.9756036000000003       seconds

method: antithetic
 Estimated value:    1.7182820841485560     
 Absolute error:    2.5568951089738334E-007
 Standard error:    3.2167852470787950E-006
 Execution time:    5.0000000000000000       seconds
 Wall time:         5.0001290999999997       seconds

and for ifunc=3

 #random variates:      1000000
 ifunc:            3
 function: log(1+x)

method: ordinary
 Estimated value:   0.38628603900688629     
 Absolute error:    8.3221130042776537E-006
 Standard error:    1.8752880904260943E-005
 Execution time:    3.8281250000000000       seconds
 Wall time:         3.8361136000000000       seconds

method: antithetic
 Estimated value:   0.38629384765015679     
 Absolute error:    5.1346973378718630E-007
 Standard error:    7.8368582622719970E-007
 Execution time:    2.7812500000000000       seconds
 Wall time:         2.7909590000000000       seconds

I get similar results with ifort -O3. I wonder how well this method generalizes to higher-dimensional Monte Carlo integration.

ETA: Antithetic variates are a known technique for variance reduction. The Wikipedia article discusses combining uniform variate x with 1-x, but with the revised code below I find that using 4 related variates works even better.

Revised code and results:

module mc_func_mod
implicit none
private
public :: dp, ifunc, f, f_integral, f_name
integer, parameter :: dp = selected_real_kind(15, 307), ifunc = 3
contains
elemental function f(x) 
! function to integrate
real(kind=dp), intent(in) :: x
real(kind=dp)             :: f
select case (ifunc)
   case (1) ; f = x**2
   case (2) ; f = exp(x)
   case (3) ; f = log(1+x)
end select
end function f
!
elemental function f_integral(jfunc)
! value of integral from 0 to 1 of f
integer, intent(in) :: jfunc
real(kind=dp)       :: f_integral
select case (jfunc)
   case (1) ; f_integral = 1/3.0_dp
   case (2) ; f_integral = exp(1.0_dp) - 1
   case (3) ; f_integral = 2.0_dp * log(2.0_dp) - 1
end select
end function f_integral
!
elemental function f_name(jfunc)
! name of function
integer, intent(in) :: jfunc
character (len=10)  :: f_name
select case (jfunc)
   case (1) ; f_name = "x^2"
   case (2) ; f_name = "exp(x)"
   case (3) ; f_name = "log(1+x)"
end select
end function f_name
end module mc_func_mod
!
program monte_carlo
use iso_fortran_env, only: compiler_version, compiler_options
use mc_func_mod, only: dp, f, ifunc, f_integral, f_name
implicit none
integer, parameter :: nsim = 100, nran = (10**6)/4, i64 = selected_int_kind(15), nmethods = 3
integer  :: i, j, imethod
integer(kind=i64) :: itick, t1, t2
real(dp) :: estimates(nsim)
real(dp) :: x1, x2, x3, x4, fsum, std_dev, mean_estimates, sum_sq_diff, start_time, end_time
character (len=20) :: methods(nmethods) = [character (len=20) :: "ordinary", "antithetic-2", "antithetic-4"]
print*,"compiler version: ",trim(compiler_version())
print*,"compiler options: ",trim(compiler_options())
call system_clock(count_rate=itick)
print*,"#random variates: ", nran*4 
print*,"ifunc: ",ifunc
print*,"function: ",trim(f_name(ifunc))
do_method: do imethod = 1,nmethods
   print "(/,a)", "method: " // trim(methods(imethod))
   call system_clock(t1)
   call cpu_time(start_time)
   do i = 1, nsim
      fsum = 0.0_dp
      do j = 1, nran
         if (imethod == 1) then ! 4 separate uniform variates        
            call random_number(x1)
            call random_number(x2)
            call random_number(x3)
            call random_number(x4)
         else if (imethod == 2) then ! 2 uniform variates and 2 antithetic variates
            call random_number(x1)
            call random_number(x2)
            x3 = 1 - x1
            x4 = 1 - x2
         else                   ! 1 uniform variate and 3 related variates
            call random_number(x1)
            if (x1 < 0.5_dp) then 
               x2 = 0.5_dp - x1 ! set x2 so that average of x1 and x2 is 0.25
             else
               x2 = 1.5_dp - x1 ! set x2 so that average of x1 and x2 is 0.75
            end if
            ! reflect x1 and x2 across 0.5 to get x3 and x4
            x3 = 1 - x1
            x4 = 1 - x2           
         end if
         fsum = fsum + f(x1) + f(x2) + f(x3) + f(x4)
      end do
      estimates(i) = fsum / (4 * real(nran, dp))
   end do
   mean_estimates = sum(estimates) / nsim
   sum_sq_diff = sum((estimates - mean_estimates)**2)
   std_dev = sqrt(sum_sq_diff / real(nsim, dp))
   call cpu_time(end_time)
   call system_clock(t2)
   print*, "Estimated value: ", mean_estimates
   print*, "Absolute error: ", abs(mean_estimates - f_integral(ifunc))
   print*, "Standard error: ", std_dev/sqrt(real(nsim, kind=dp))
   print*, "Execution time: ", end_time - start_time, " seconds"
   print*, "Wall time:      ", (t2-t1)/real(itick, kind=dp), " seconds"
end do do_method
end program monte_carlo

Results:

 #random variates:      1000000
 ifunc:            1
 function: x^2

method: ordinary
 Estimated value:   0.33334033354830317     
 Absolute error:    7.0002149698544791E-006
 Standard error:    2.9265616727108698E-005
 Execution time:    1.3125000000000000       seconds
 Wall time:         1.3279851000000000       seconds

method: antithetic-2
 Estimated value:   0.33334297610831598     
 Absolute error:    9.6427749826655251E-006
 Standard error:    1.0936660038148731E-005
 Execution time:   0.67187500000000000       seconds
 Wall time:        0.67663379999999995       seconds

method: antithetic-4
 Estimated value:   0.33333539505447651     
 Absolute error:    2.0617211431983584E-006
 Standard error:    3.9821047110537445E-006
 Execution time:   0.35937500000000000       seconds
 Wall time:        0.36495689999999997       seconds

feenberg · June 12, 2023, 12:33pm

As a graduate student In the mid 1970s I tested Halton sequences to integrate 4 dimension normal distributions for a Professor Quandt at Princeton, but the convergence rate was so fast he didn’t believe it could possibly be real. Later I used them in
an economics of health care spending paper with great success. On that basis I would certainly give the Sobol sequence a try and revert to Halton if I was disappointed.

Beliavsky · June 12, 2023, 12:46pm

Richard Quandt created a Fortran-based econometrics package GQOPT that works with compilers no longer being sold:

With PC’s, GQOPT will run under DOS, from a DOS window in Windows 98 and XT. GQOPT is available in Version 6.10 for the Microsoft FORTRAN compiler Version 5.0 or higher, the Lahey F77L compiler Version 5.0 or higher. Version 6.10 for these obsolescent compilers will not be updated in the future. Version 7.05 is available for the Lahey EM/32 compiler (DOS) and the compilers (a) Digital Visual Fortran (Windows 98)(which supersedes the Microsoft Power Station Compiler), (b) Lahey LF90 and LF95 (Windows 98), (c) WATCOM (Windows 98), (d) the LINUX LF75 and Absoft compilers, and also (e) for Unix workstations such as Sun, DEC, and RS/6000.

The present Version 7.20 is available for the LF95 under Windows XT, the Absoft and LF95 compilers for LINUX, and the Unix compilers for Sun, Digital Alpha, and RS/6000 computers.

Arjen · June 13, 2023, 9:27am

I am intrigued by the properties of these sequences and their use in Monte Carlo like algorithms. I did a quick investigation. It does seem like quasi-random numbers are a good choice (I used a simple numerical integration procedure). However, the results are not uniformly much better. I will try and write it down.

vmagnin · June 24, 2023, 1:53pm

Reading a magazine article about Penrose tilings, I have stumbled upon that mathematical paper whose title is the only thing I can understand:
BASARAB MATEI AND YVES MEYER, SIMPLE QUASICRYSTALS ARE SETS OF STABLE SAMPLING

But it was applied in:
Grundland, Mark & Patera, Jiri & Masáková, Zuzana & Dodgson, Neil. (2009). Image Sampling with Quasicrystals. Symmetry, Integrability and Geometry: Methods and Applications. 5. doi:10.3842/SIGMA.2009.075.
https://www.researchgate.net/publication/26636120_Image_Sampling_with_Quasicrystals

We investigate the use of quasicrystals in image sampling. Quasicrystals produce space-filling, non-periodic point sets that are uniformly discrete and relatively dense, thereby ensuring the sample sites are evenly spread out throughout the sampled image.

CRquantum · November 24, 2023, 5:27am

That is definitely useful in doing some Monte Carlo integrals since those quasi random samples can cover the parameter space more evenly than regular random number samples. It is just that the usual way in Monte Carlo in obtaining the error bar, does not apply to quasi Monte Carlo.
Just curious, is there some Fortran code that can generate not only quasi uniform random number, but also quasi gaussian random number?

Rob777 · November 24, 2023, 9:17am

I have used the Sobol sequence to sample a 5-dimension design space for an Aerodynamic Optimization…
The sequence was way too effective in sampling the design space during the DOE phase…

jeff.irwin · November 24, 2023, 5:28pm

I’m not sure about your first question, but if you can generate numbers in a uniform distribution, there are several algorithms to transform that uniform distribution into a Gaussian distribution:

For uniform distributions, I typically use the intrinsic function RANDOM_NUMBER. Whether or not its PRNG algorithm is suitable for you, I don’t know.

Topic		Replies	Views
Fortran erfinv and erfinv functions	6	924	November 28, 2023
RNG for MC simulations Help	8	707	January 2, 2023
A (mis)adventure in optimization	2	1379	January 6, 2021
Fortran code to test random deviates	3	478	February 24, 2021
What does this code mean? Help	24	1435	May 17, 2022

Sobol low-discrepancy sequence

Related topics