George Marsaglia’s Diehard suite is still available and, from memory, contains several ckassical measures of spacing.
While the standard says pseudo-random numbers in the range 0<=r<1,
having random numbers in the range 0<r<1 is actually better for modelling Poisson () =-log(random_number(x))
I really only tested a couple implementations. They both functioned as intended.
This isn’t something I tested explicitly. But really, I’d expect zero to be returned as often as any other number. So the question becomes, what’s the spacing? Or, how many possible values are there? I didn’t dig that deeply.
I had forgotten about this, but I did run the couple of RNG algorithms through an external battery of tests called dieharder:
I vaguely recall that the splitmix algorithm scored better than the linear congruential algorithm. The programs in the app folder of the rngff repo can be used to feed the outputs to the diehard test suite. I don’t claim to fully understand all of the statistics and theory behind those tests or their results.
In a future Fortran standard it would be nice to see an additional random number routine in the spirit of high quality implementations of intrinsic functions. This new random number routine, tentatively random_real (r), would return a floating point number r that behaves as a properly rounded pseudo-random real number from the unit interval. Depending on the rounding mode outcome r==1 is possible or not possible for the new routine, while the outcome r==0 has vanishing likelihood under any rounding mode. (A precise specification might promise that r==0 will not happen, and perhaps also that denormal numbers will not be returned.)
It seems like these types of conditions should be easy to enforce. In the two random() routines I posted, the zero is already a special case with very low probability of occurring (once every 8.5e+37 for the real32 routine, and once every 4.5e+307 for the real64 routine). Because of the way the floating point value is constructed with bit operations, subnormals are not returned, and 1.0 is not returned. In my first posting of the code, I had an off-by-1 error that could return subnormals, but they would have been returned with those same one in 8.5e+37 and 4.5e+307 probabilities; that was not intentional, I just miscounted the desired exponent range, and I corrected the code subsequently.
That should be easy to impose too. One could just clip the negative exponent and not return the 0.0 special case. I think this would mean that the numbers with the smallest exponent (the same as tiny()) would be returned twice as often as they should be returned, but again we are talking about one part in 8.5e+37 or 4.5e+307, so practically speaking those probabilities are such outliers that they could never be observed.
This is the main difference between the division algorithm and the geometric distribution algorithm. I’m not sure exactly how to think about this difference. One way to think about the division algorithm is that only 2^24 values are returned, each with exactly equal probability. 0.0 is the smallest value returned, spacing(1.0,-1.0) is the next larger value returned, 2*spacing(1.0,-1.0) is the next value returned, and so on up to (2^24-1)*spacing(1.0,-1.0) for the highest value. So you can think of each of those special returned values as “representing” a set of floating point values. For the largest binary exponent of 0, each of those numbers represents just itself. For the binary exponent of -1, each returned value represents two floating point numbers, itself and the next higher number. For exponent of -3, they each represent four floating point numbers, itself and the next three higher numbers, and so on. The 0.0 value that is returned then represents itself and all of the floating point numbers <spacing(1.0,-1.0) – there are 102 exponents in that range, from -126 up to -24. (Well, -126 is the 0.0 itself plus all the subnormals, which is why I miscounted in that first code.) From this perspective, the 0.0 value represents more floating point numbers than are actually returned in total by the division algorithm.
The geometric distribution algorithm works with the opposite goal. Here, every floating point number represents just itself, and every floating point number in [0.0,1.0) will be returned eventually (in principle). But now, the floating point numbers with binary exponent -1 are sampled half as often as those with exponent 0, the numbers with exponent -2 are sampled half as often as those with -1, and so on, all the way down to tiny(). Now the 0.0 just represents itself (or maybe itself plus all the subnormals), not itself along with a bunch of other small normal numbers, so it will be returned with very small probability, 1/2 the probability as any of the numbers with the same exponent as tiny(), -125, 1/4 the probability as the numbers with exponent -124, and so on.
Let me state again that this is not my area of expertise, but I think this perspective explains the conceptual difference between the division algorithm and the geometric distribution algorithm. I should also say that all of these probability arguments rely on having a good way to generate random bits. My code uses that xoshiro256** algorithm from the fortran stdlib, which I think is good, but it is not perfect. The point of my posting the code was not to discuss the features of that bit algorithm, but rather to focus on the difference between the division algorithm and the geometric distribution algorithm to produce floating point numbers given that source of random bits.
I was curious about histograms for the two algorithms, division and geometric distribution, so I computed histograms based on the floating point exponent for real32 and real64 random numbers x. Here is the result:
Exponent Histrogram for imax = 1073741824
spacing(0.75_real32)= 0.596046448E-7 exponent= -23
spacing(0.75_real64)= 0.111022302E-15 exponent= -52
e expected gd32(e) div32(e) gd64(e) div64(e)
0 536870912.00 536876845 536895057 536851062 536878322
-1 268435456.00 268417649 268423664 268433531 268413884
-2 134217728.00 134223864 134216074 134238965 134202646
-3 67108864.00 67118099 67098501 67108268 67122450
-4 33554432.00 33550297 33546494 33558180 33564081
-5 16777216.00 16771934 16781254 16774659 16779088
-6 8388608.00 8389822 8391305 8389773 8391761
-7 4194304.00 4199762 4192776 4193069 4195771
-8 2097152.00 2095223 2097174 2096702 2096978
-9 1048576.00 1049931 1049630 1049358 1046577
-10 524288.00 524073 524945 524737 525246
-11 262144.00 262540 262832 261491 262119
-12 131072.00 130718 131269 130948 131133
-13 65536.00 65683 65242 65507 66149
-14 32768.00 32696 32926 32549 32681
-15 16384.00 16378 16174 16530 16466
-16 8192.00 8275 8236 8216 8182
-17 4096.00 4135 4136 4128 4095
-18 2048.00 1947 2092 2119 2171
-19 1024.00 980 1004 1017 996
-20 512.00 495 510 503 515
-21 256.00 241 269 272 254
-22 128.00 124 130 130 138
-23 64.00 57 62 58 74
-24 32.00 28 0 29 23
-25 16.00 14 0 14 10
-26 8.00 5 0 4 6
-27 4.00 2 0 3 4
-28 2.00 4 0 0 3
-29 1.00 1 0 1 1
-30 0.50 0 0 1 0
-31 0.25 2 0 0 0
-126 0.00 0 68 0 0
Both the division and the geometric distribution codes use the same xoshiro256ss() function for the random bits. For the exponents I did not use the intrinsic exponent() function, rather I extracted the 8-bit exponent field directly and subtracted the bias. This results in the binary exponent for all nonzero numbers, and for 0.0_real32 it results in -126. So those 68 values with exponent -126 in the last row of that table are actually the zeros generated by the division algorithm. Neither algorithm generates values >=1.0 or any subnormal values.
edit: I added two columns of real64 results, and added an option to produce nonrepeatable prng sequences. This output is a typical run. In the case of real32 arithmetic, the division algorithm produces only 2**24 distinct values, so for this run with imax=2**30, each of those numbers will be duplicated 2**6=64 times on average. The e=-23 row has 62 values and the e=-126 row has 68 values for the div32 column, both of which are close to that 2**6 expected repetition factor. This shows how the 0.0_real32 value “represents” all the floating point values from 0.0 up to 2.0**-24 in the division algorithm. In contrast, for the div64 column, there are 2**53 distinct values, so each one has only a 1 out of 2**11=2048 chance of being returned. Here the division algorithm has only a 1/2048 chance of producing a 0.0_real64 value. The above statistics are for one of the typical runs that does not produce a 0.0_real64. The last two columns show very similar statistics all the way down to exponents of -31 where the expected counts are <1. Those statistics also agree very well with the gd32 column. So it looks like the geometric distribution and division statistics are different for real32 arithmetic, but not so much for real64 arithmetic. That is because for the division algorithm there are so few real32 values produced compared to the real64 case; for 2**30 total values, the real32 results are oversampled, while the real64 results are undersampled.
These observations demonstrate the practical difference in the two approaches, but they are more obvious for real32 than they are for real64 for this sample size. It would take a much larger sample, some 1024x larger, for the statistics to show a difference for real64 values. That sampling size is about where a 0.0_real64 value becomes likely to occur in the division algorithm.
Ron Shepard’s code, relying on the bit-level xoshiro256ss() routine, is clearly the model for a library implementation of a random number procedure that returns correctly rounded random reals. (But see also [1] and [2], a.o..)
For my own entertainment and other purposes I wrote a procedure that relies on the intrinsic Fortran random_number () procedure to produce correctly rounded random reals. It is appended here.
[1] Chris Lomont: Generating Uniform Random Numbers on [0,1] · Lomont.org .
[2] Nima Badizadegan. “Floating-point random number generation.” See GitHub - specbranch/fp-rand: Floating-point random number generation · GitHub.
module RandomReal_Mod
! Produce pseudorandom floating point numbers in the range [0,1]
! that behave like properly rounded pseudorandom real numbers.
!..use and access
use, intrinsic :: iso_fortran_env, only : &
R4 => real32, R8 => real64, I8 => int64
implicit none (type, external)
private
public :: Random_Real, Random_Real_Test
!..data
integer, parameter :: d32 = digits(1.0_R4)
integer, parameter :: d64 = digits(1.0_R8)
! For ieee floating point types, expect d32==24, d64==53.
real (kind=R4), parameter :: eps32 = scale(1.0_R4,-d32)
real (kind=R8), parameter :: eps64 = scale(1.0_R8,-d64)
! Expect that eps{32,64}==spacing(0.75_{R4,R8}) for the matching kind.
! Without supreme confidence we expect that the available intrinsic
! random_number procedure provides numbers that are a random integer
! multiple of eps32 or eps64. However, the Fortran 2023 (and earlier)
! standard is not so precise. The spacing might be coarser, or the
! spacing could be variable with smaller spacing for smaller results.
real (kind=R4), parameter :: eps32safe = scale(1.0_R4,-3*d32/4)
real (kind=R8), parameter :: eps64safe = scale(1.0_R8,-3*d64/4)
! With much greater confidence we expect that the result t1 of
! call random_number (t0)
! t1 = eps{32,64}safe*floor(t0/eps{32,64}safe,kind={I4,I8})
! is a random number from an equally spaced set of floating point numbers
! with spacing eps{32,64}safe.
!..procedures
interface Random_Real
module procedure Random_Real_R4
module procedure Random_Real_R8
end interface Random_Real
contains
impure elemental subroutine Random_Real_R4 (r)
real (kind=R4), intent (out) :: r
! Return a random floating point number that appears as a correctly
! rounded real random number from the unit interval.
real (kind=R4) :: wid, x, t0, t1
wid = 1
x = 0
do while (spacing(x)<2*wid)
call random_number (t0)
t1 = eps32safe*floor(t0/eps32safe)
! t1 is a random integer multiple of eps32safe in the half-closed
! range [0,1).
x = x+wid*t1
wid = wid*eps32safe
end do
r = x
return
end subroutine Random_Real_R4
impure elemental subroutine Random_Real_R8 (r)
real (kind=R8), intent (out) :: r
! Return a random floating point number that appears as a correctly
! rounded real random number from the unit interval.
real (kind=R8) :: wid, x, t0, t1
wid = 1
x = 0
do while (spacing(x)<2*wid)
call random_number (t0)
t1 = eps64safe*floor(t0/eps64safe,kind=I8)
! t1 is a random integer multiple of eps64safe in the half-closed
! range [0,1).
x = x+wid*t1
wid = wid*eps64safe
end do
r = x
return
end subroutine Random_Real_R8
subroutine Random_Real_Test (nct)
integer (kind=I8), intent (in) :: nct
! Test the Random_Real routines. Write to the default output channel.
! nct is the number of samples in the test.
! A value nct = 2**32 is informative and quick.
integer, parameter :: nxm = 48
! We test for the frequency of result exact 1, result in any of the
! largest nxm binary intervals (if radix=2), and result below 2^(-nxm).
integer, parameter :: nmb = 4
! We test the frequency of outcomes of floor(rand()/eps) modulo 2^nmb;
! this is testing the low-order bit pattern of the random result.
integer (kind=I8) :: ic, nc0, nc1, nc(0:nxm-1), nm(0:2**nmb-1)
! ic is the iteration counter.
! nc0, nc1, nc(:), nm(:) are event counters.
write (*,'(1x,a,2(1x,es12.5))') &
'Sanity check, expect 1.0 1.0.', &
spacing(0.75_R4)/eps32, &
spacing(0.75_R8)/eps64
write (*,'(1x,a)') &
'Ideal frequencies assuming round to nearest is in effect.'
write (*,'(1x,a)') 'Test the kind=R4 routine.'
nc0 = 0
nc1 = 0
nc(:) = 0
nm(:) = 0
do ic = 0, nct-1
block
integer :: ix, im
real (kind=R4) :: x4
call Random_Real (x4)
ix = exponent(x4)
if (x4<=0) then
error stop 'x4<=0'
else if (ix<=-size(nc)) then
nc0 = nc0+1
else if (ix<=0) then
nc(-ix) = nc(-ix)+1
else if (x4==1) then
nc1 = nc1+1
else
error stop 'x4>1'
end if
im = modulo(floor(x4/eps32),size(nm))
nm(im) = nm(im)+1
end block
end do
call Test_Report (nct, nc0, nc1, nc, nm, d32)
write (*,'(1x,a)') 'Test the kind=R8 routine.'
nc0 = 0
nc1 = 0
nc(:) = 0
nm(:) = 0
do ic = 0, nct-1
block
integer :: ix, im
real (kind=R8) :: x8
call Random_Real (x8)
ix = exponent(x8)
if (x8<=0) then
error stop 'x8<=0'
else if (ix<=-size(nc)) then
nc0 = nc0+1
else if (ix<=0) then
nc(-ix) = nc(-ix)+1
else if (x8==1) then
nc1 = nc1+1
else
error stop 'x8>1'
end if
im = modulo(floor(x8/eps64,kind=I8),size(nm,kind=I8))
nm(im) = nm(im)+1
end block
end do
call Test_Report (nct, nc0, nc1, nc, nm, d64)
contains
subroutine Test_Report (nct, nc0, nc1, nc, nm, dd)
integer (kind=I8), intent (in) :: nct, nc0, nc1, nc(0:), nm(0:)
integer, intent (in) :: dd
integer :: ix, im
write (*,'(1x,a)') &
'Actual and ideal log2 frequencies of exact 1.'
write (*,'(1x,a,1x,es12.5,1x,i3)') &
'nc1', log(nc1/real(nct))/log(2.0), -dd-1
write (*,'(1x,a)') &
'Actual and ideal log2 frequencies of exponent -ix.'// &
' (Expect -ix-1.)'
do ix = 0, size(nc)-1
write (*,'(1x,a,1x,i0,1x,es12.5,1x,i3)') &
'nc', ix, log(nc(ix)/real(nct))/log(2.0), -ix-1
enddo
write (*,'(1x,a)') &
'Actual log2 frequencies of result below the cutoff.'// &
' (Expect -infty.)'
write (*,'(1x,a,1x,es12.5)') &
'nc0', log(nc0/real(nct))/log(2.0)
write (*,'(1x,a)') &
'Actual and ideal log2 frequencies of the lowest-order'// &
' bit patterns.'
do im = 0, size(nm)-1
write (*,'(1x,a,1x,i0,1x,es12.5,1x,i3)') &
'nm', im, log(nm(im)/real(nct))/log(2.0), -nmb
enddo
end subroutine Test_Report
end subroutine Random_Real_Test
end module RandomReal_Mod
program Main
use, intrinsic :: iso_fortran_env, only : &
R4 => real32, R8 => real64, I8 => int64
use RandomReal_Mod
implicit none (type, external)
integer (kind=I8), parameter :: nct = 2_I8**32
call random_init (repeatable=.false., image_distinct=.true.)
call Random_Real_Test (nct)
stop
!..end notes
! Karney, Charles FF. "Sampling exactly from the normal distribution."
! ACM Transactions on Mathematical Software (TOMS) 42 (2016): 1-14.
! Look at libraries cited by Karney.
! Chris Lomont: https://lomont.org/posts/2017/unit-random/.
! Lomont, Chris. "Random number generation." In Game programming Gems 7,
! pp. 113-125. Course Technology, 2008. [Citation only; I haven't found
! the article.]
! Nima Badizadegan. "Floating-point random number generation." See
! https://github.com/specbranch/fp-rand.
! Downey, Allen B. "Generating pseudo-random floating-point values."
! cit. on (2007): 90. (Web manuscript.)
! Blackman, David, and Sebastiano Vigna. "Scrambled linear pseudorandom
! number generators." ACM Transactions on Mathematical Software (TOMS) 47
! (2021): 1-32.
! See also https://github.com/camel-cdr/cauldron.
! Note the Fortran Stdlib routine dist_rand: Get a random integer with
! specified kind (int8, int16, int32 or int64).
end program Main
```
In case anyone is interested in using this geometric distribution random number algorithm, here is the full code that prints out the exponent histogram statistics.
module random_mod
use, intrinsic :: iso_fortran_env, only: int8, int16, int32, int64, real32, real64
implicit none
interface random_number
module procedure random_gd_32
module procedure random_gd_64
module procedure random_int8
module procedure random_int16
module procedure random_int32
module procedure random_int64
module procedure random_int32_1d
module procedure random_int16_1d
module procedure random_int8_1d
end interface random_number
interface random_div
module procedure random_div_32
module procedure random_div_64
end interface random_div
interface random_gd
module procedure random_gd_32
module procedure random_gd_64
end interface random_gd
private
public :: random_number, random_seed, random_init, random_gd, random_div
integer, parameter :: st_size = 4
! the default internal state was extracted from the gfortran prng, which also
! has a 256-bit internal state and uses the same xoshiro256** algorithm..
integer(int64), parameter :: st_default(st_size) = [ &
& 3780002631561922365_int64, &
& -1358301926617824657_int64, &
& -2775566539014194259_int64, &
& 0090971013994518298_int64 ]
integer(int64), save :: st(st_size) = st_default ! default repeatable internal state.
contains
impure function xoshiro256ss() result (res)
! return an int64 integer with random bits.
! based on fortran stdlib xoshiro256ss() at
! https://stdlib.fortran-lang.org/sourcefile/stdlib_random.fypp.html
! this algorithm has a cycle length of 2^256-1, and all possible
! 2^256-1 nonzero states are generated during this cycle.
! note: because this function is impure, it must not be referenced
! multiple times within a single statement.
implicit none
integer(int64) :: res, t
intrinsic :: shiftl, ieor, ishftc
res = ishftc(st(2) * 5, 7) * 9 ! this expression requires silent integer overflow.
! update the internal state for the next call.
t = shiftl(st(2), 17)
st(3) = ieor(st(3), st(1))
st(4) = ieor(st(4), st(2))
st(2) = ieor(st(2), st(3))
st(1) = ieor(st(1), st(4))
st(3) = ieor(st(3), t)
st(4) = ishftc(st(4), 45)
return
end function xoshiro256ss
subroutine random_seed( size, put, get )
! modeled after the intrinsic.
! the arguments are default integers.
! with most compilers size_put = 8 = 2*st_size.
! it is the programmer's responsibility to ensure that the actual
! arguments for put(:) and get(:) are at least this large.
! the put(:) array must be nonzero. this is treated as a fatal error since
! there is no way otherwise to report an error to the calling program.
! the zero internal state cannot be generated otherwise during the 2**256-1 cycle.
use, intrinsic :: iso_fortran_env, only: error_unit
implicit none
integer, parameter :: st_kind = kind(st) ! same as int64.
integer, parameter :: size_put = st_size * storage_size(1_st_kind) / storage_size(1)
integer, intent(out), optional :: size ! size of the default integer arrays.
integer, intent(in), optional :: put(size_put) ! set the internal state to these bits.
integer, intent(out), optional :: get(size_put) ! extract the internal state.
intrinsic :: storage_size, all, present, transfer
if ( present(size) ) size = size_put
if ( present(put) ) then
if ( all( put(:)==0 ) ) then ! the internal state must not be set to zero.
write(error_unit,'(a,i0,a)') 'put(1:', size_put, ')==0 detected in random_seed().'
error stop
endif
st(:) = transfer( put, st )
endif
if ( present(get) ) get(:) = transfer( st, get )
return
end subroutine random_seed
subroutine random_init( repeatable, image_distinct )
! reset the internal state of the PRNG.
! modeled after the intrinsic.
implicit none
logical, intent(in) :: repeatable ! .true. reset the internal state to st_default(:)
! .false. reset the internal state to an arbitrary value.
logical, intent(in) :: image_distinct ! this value is currently ignored.
integer :: i
integer(int64) :: count
intrinsic :: system_clock, ieor, ishftc
st(:) = st_default(:) ! reset the state to the repeatable default.
if ( .not. repeatable ) then
call system_clock( count=count ) ! get the current system clock count.
count = ieor( count, ishftc(count, 17 ) ) ! scramble the bits a little.
do i = 1, size(st)
st(i) = ieor( st(i), count )
enddo
endif
return
end subroutine random_init
impure elemental subroutine random_div_32( x )
! return a random number x with uniform distribution using simple division.
! x is one of 2**24 possible values with equal probability.
! this is the algorithm typically used by fortran compilers to convert
! random bits to real values.
implicit none
real(real32), intent(out) :: x
integer(int32) :: ix
integer, parameter :: escale = -24 ! ieee 23 fraction bits plus 1 hidden bit.
intrinsic :: int, shiftr, real, scale
ix = int( shiftr( xoshiro256ss(), 40), int32 ) ! keep the high-order 24 bits.
! ix is an integer in [0,2**24-1].
! convert to real and scale by 2**(-24).
! the scale() function eliminates any possibility of roundoff errors that might
! occur in the expression real(ix,real32) / real(2**24,real32).
x = scale( real(ix,real32), escale )
return
end subroutine random_div_32
impure elemental subroutine random_div_64( x )
! return a random number x with uniform distribution using simple division.
! x is one of 2**53 possible values with equal probability.
! this is the algorithm typically used by fortran compilers to convert
! random bits to real values.
implicit none
real(real64), intent(out) :: x
integer(int64) :: ix
integer, parameter :: escale = -53 ! ieee 52 fraction bits plus 1 hidden bit.
intrinsic :: shiftr, real, scale
ix = shiftr( xoshiro256ss(), 11) ! keep the high-order 53 bits.
! ix is an integer in [0,2**53-1].
! convert to real and scale by 2**(-53).
! the scale() function eliminates any possibility of roundoff errors that might
! occur in the expression real(ix,real64) / real(2**53,real64).
x = scale( real(ix,real64), escale )
return
end subroutine random_div_64
impure elemental subroutine random_gd_32( x )
! return a random number x with uniform distribution from the set
! of all possible normal floating point numbers in the range [0.0,1.0).
implicit none
real(real32), intent(out) :: x
integer(int32) :: e, ix, lz
integer, parameter :: emin = minexponent(x)
intrinsic :: shiftr, leadz, mvbits, transfer, int
! ieee real32 format is: s (31), e (30...23), f (22...0)
! compute a random integer in [0,-emin] with a geometric distribution: 1/2, 1/4, 1/8...
lz = leadz( xoshiro256ss() )
if ( lz == 64 ) lz = lz + leadz( xoshiro256ss() )
if ( lz > -emin ) then
x = 0.0_real32
else
ix = int( shiftr( xoshiro256ss(), 41 ), int32) ! keep the high-order 23 bits for the fraction.
e = 1 - emin - lz ! biased exponent in [1,1-emin].
call mvbits( e, 0, 8, ix, 23 )
x = transfer( ix, x )
endif
return
end subroutine random_gd_32
impure elemental subroutine random_gd_64( x )
! return a random number x with uniform distribution from the set
! of all possible normal floating point numbers in the range [0.0,1.0).
implicit none
real(real64), intent(out) :: x
integer(int64) :: e, ix
integer :: i, lz, lzi
integer, parameter :: emin = minexponent(x)
integer, parameter :: numw = ((-emin-1) / bit_size(ix) + 1) ! number of int64 words needed to hold -emin bits.
intrinsic :: bit_size, shiftr, leadz, mvbits, transfer
! ieee real64 format is: s (63), e (62...52), f (51...0)
! compute a random integer in [0,-emin] with a geometric distribution: 1/2, 1/4, 1/8...
lz = 0
do i = 1, numw ! this loop almost always executes only once.
lzi = leadz( xoshiro256ss() )
lz = lz + lzi
if ( lzi < 64 ) exit
enddo
if ( lz > -emin ) then
x = 0.0_real64
else
ix = shiftr( xoshiro256ss(), 12 ) ! keep the high-order 52 bits for the fraction.
e = 1 - emin - lz ! biased exponent in [1,1-emin].
call mvbits( e, 0, 11, ix, 52 )
x = transfer( ix, x )
endif
return
end subroutine random_gd_64
impure elemental subroutine random_int64( i )
! return an integer i with random bits.
implicit none
integer(int64), intent(out) :: i
i = xoshiro256ss()
return
end subroutine random_int64
impure elemental subroutine random_int32( i )
! return an integer i with random bits.
implicit none
integer(int32), intent(out) :: i
integer(int32) :: i32(2)
intrinsic :: transfer
i32 = transfer( xoshiro256ss(), i32 )
i = i32(1)
return
end subroutine random_int32
subroutine random_int32_1d( i )
! return an integer array i(:) with random bits.
implicit none
integer(int32), intent(out) :: i(:)
integer(int32) :: n, j, i32(2)
intrinsic :: size, transfer
n = size(i)
do j = 1, (n-1), 2 ! extract bits by integer pairs.
i(j:j+1) = transfer( xoshiro256ss(), i32 )
enddo
if ( j == n ) then
i32 = transfer( xoshiro256ss(), i32 )
i(n) = i32(1)
endif
return
end subroutine random_int32_1d
impure elemental subroutine random_int16( i )
! return an integer i with random bits.
implicit none
integer(int16), intent(out) :: i
integer(int16) :: i16(4)
intrinsic :: transfer
i16 = transfer( xoshiro256ss(), i16 )
i = i16(1)
return
end subroutine random_int16
subroutine random_int16_1d( i )
! return an integer array i(:) with random bits.
implicit none
integer(int16), intent(out) :: i(:)
integer :: n, j
integer(int16) :: i16(4)
intrinsic :: size, transfer
n = size(i)
do j = 1, (n-3), 4
i(j:j+3) = transfer( xoshiro256ss(), i16 )
enddo
if ( j <= n ) then
i16 = transfer( xoshiro256ss(), i16 )
i(j:n) = i16(1:n-j+1)
endif
return
end subroutine random_int16_1d
impure elemental subroutine random_int8( i )
! return an integer i with random bits.
implicit none
integer(int8), intent(out) :: i
integer(int8) :: i8(8)
intrinsic :: transfer
i8 = transfer( xoshiro256ss(), i8 )
i = i8(1)
return
end subroutine random_int8
subroutine random_int8_1d( i )
! return an integer array i(:) with random bits.
implicit none
integer(int8), intent(out) :: i(:)
integer :: n, j
integer(int8) :: i8(8)
intrinsic :: size, transfer
n = size(i)
do j = 1, (n-7), 8
i(j:j+7) = transfer( xoshiro256ss(), i8 )
enddo
if ( j <= n ) then
i8 = transfer( xoshiro256ss(), i8 )
i(j:n) = i8(1:n-j+1)
endif
return
end subroutine random_int8_1d
end module random_mod
program prng
use, intrinsic :: iso_fortran_env, only: real32, real64, int32, int64
use random_mod, only: random_init, random_seed, random_gd, random_div
implicit none
integer :: i, emin, emax, expd, imax= 2**30
real(real32) :: x
real(real64) :: d, sum, mean, xmin, xmax
integer(int32) :: get(8)
integer, dimension(exponent(tiny(d))-1:1) :: knt_div32, knt_gd32, knt_div64, knt_gd64 ! histograms.
character(*), parameter :: cfmt = '(*(g0.9:1x))'
call random_init( repeatable=.false., image_distinct=.true. )
call random_seed( size=i )
if ( i .ne. 8 ) then
print cfmt, 'unexpected size=', i, '.ne. 8'
error stop
endif
call random_seed( get=get )
print '(a,*(/b32.32))', 'initial random_seed get(:)=', get
knt_gd32(:) = 0
emin = huge(emin)
emax = -huge(emax)
xmin = huge(xmin)
xmax = -1.0
sum = 0.0
do i = 1, imax
call random_gd( x )
d = x
expd = unbiased_exponent_32(x)
emin = min( emin, expd )
emax = max( emax, expd )
knt_gd32(expd) = knt_gd32(expd) + 1
xmin = min( xmin, d )
xmax = max( xmax, d )
sum = sum + d
enddo
mean = sum / imax
print cfmt, 'gd real32 imax=', imax, 'mean=', mean, 'xmin=', xmin, &
& 'xmax=', xmax, 'emin=', emin, 'emax=', emax, 'minexponent=', minexponent(x)
knt_gd64(:) = 0
emin = huge(emin)
emax = -huge(emax)
xmin = huge(xmin)
xmax = -1.0
sum = 0.0
do i = 1, imax
call random_gd( d )
expd = unbiased_exponent_64(d)
emin = min( emin, expd )
emax = max( emax, expd )
knt_gd64(expd) = knt_gd64(expd) + 1
xmin = min( xmin, d )
xmax = max( xmax, d )
sum = sum + d
enddo
mean = sum / imax
print cfmt, 'gd real64 imax=', imax, 'mean=', mean, 'xmin=', xmin, &
& 'xmax=', xmax, 'emin=', emin, 'emax=', emax, 'minexponent=', minexponent(d)
knt_div32(:) = 0
emin = huge(emin)
emax = -huge(emax)
xmin = huge(xmin)
xmax = -1.0
sum = 0.0
do i = 1, imax
call random_div( x )
d = x
expd = unbiased_exponent_32(x)
emin = min( emin, expd )
emax = max( emax, expd )
knt_div32(expd) = knt_div32(expd) + 1
xmin = min( xmin, d )
xmax = max( xmax, d )
sum = sum + d
enddo
mean = sum / imax
print cfmt, 'div real32 imax=', imax, 'mean=', mean, 'xmin=', xmin, &
& 'xmax=', xmax, 'emin=', emin, 'emax=', emax, 'minexponent=', minexponent(x)
knt_div64(:) = 0
emin = huge(emin)
emax = -huge(emax)
xmin = huge(xmin)
xmax = -1.0
sum = 0.0
do i = 1, imax
call random_div( d )
expd = unbiased_exponent_64(d)
emin = min( emin, expd )
emax = max( emax, expd )
knt_div64(expd) = knt_div64(expd) + 1
xmin = min( xmin, d )
xmax = max( xmax, d )
sum = sum + d
enddo
mean = sum / imax
print cfmt, 'div real64 imax=', imax, 'mean=', mean, 'xmin=', xmin, &
& 'xmax=', xmax, 'emin=', emin, 'emax=', emax, 'minexponent=', minexponent(d)
call print_knt()
contains
subroutine print_knt()
! print the histogram summary.
integer :: e
real :: expected
real(real32) :: x
real(real64) :: d
! the ubound for the knt* arrays is 1. This is intended to catch prng
! roundoff errors where x>=1.0 might occur.
print '(/a,i0)', 'Exponent Histrogram for imax = ', imax
x = spacing(0.75_real32)
print cfmt, 'spacing(0.75_real32)=', x, 'exponent=', exponent(x)
d = spacing(0.75_real64)
print cfmt, 'spacing(0.75_real64)=', d, 'exponent=', exponent(d)
print '(a5,a14,4a11)', 'e', 'expected', 'gd32(e)', 'div32(e)', 'gd64(e)', 'div64(e)'
do e = ubound(knt_gd32,dim=1), lbound(knt_gd32,dim=1), -1
if ( all([knt_gd32(e),knt_div32(e),knt_gd64(e),knt_div64(e)] == 0) ) cycle ! just print nonzero rows.
if ( e > 0 ) then
expected = 0.0
else
!expected = real(imax) * 2.0**(e-1)
expected = scale( real(imax), e-1 )
endif
print '(i5,f14.2,4i11)', e, expected, knt_gd32(e), knt_div32(e), knt_gd64(e), knt_div64(e)
enddo
return
end subroutine print_knt
function unbiased_exponent_32( x ) result(e)
! extract the exponent field from x and remove the bias.
! ieee real32 format is: s (31), e (30...23), f (22...0)
implicit none
integer :: e
real(real32), intent(in) :: x
integer, parameter :: bias = minexponent(x) - 1 ! exponent of tiny(x) - 1.
intrinsic :: ibits, transfer
e = ibits( transfer( x, 1_int32 ), 23, 8) + bias
return
end function unbiased_exponent_32
function unbiased_exponent_64( x ) result(e)
! extract the exponent field from x and remove the bias.
! ieee real64 format is: s (63), e (62...52), f (51...0)
implicit none
integer :: e
real(real64), intent(in) :: x
integer, parameter :: bias = minexponent(x) - 1 ! exponent of tiny(x) - 1.
intrinsic :: int, ibits, transfer, kind
e = int( ibits( transfer( x, 1_int64 ), 52, 11), kind(e) ) + bias
return
end function unbiased_exponent_64
end program prng
Here are some comments on the code.
edit: I updated the posted code and the following comments.
I implemented it only for real32 and real64 kinds. It should be straightforward to extend this to real16 and real128, but those kind values are not yet universally supported.
I overloaded random_number() to accept int8, int16, int32 and int64 arguments. I think the fortran standard should do this same thing, and I have no idea why this wasn’t done decades ago. Of course, if the standard were to do this I would expect support for all integer kinds supported by the compiler, not just these selected kinds. I did not do int128 because that kind is not supported on all the compilers I use.
In the previous code I posted, I did not treat arrays of small integers efficiently. I experimented a little, and discovered that I could have a generic interface for an elemental scalar routine and for a rank-1 array routine. It surprised me that this worked, but it does appear to be standard conforming. So I went ahead and did that. That means that this code is efficient for rank-1 arrays of type int8, int16, int32, and int64. However, higher-rank arrays will fall back to the relatively inefficient elemental scalar code. These cases are inefficient because the whole state vector (256 bits) is cycled for every random integer. In contrast, for the rank-1 case for int8 arguments, the state is cycled only once for every 8 values generated. I do not know of a simple way to handle the arbitrary rank case in a similar way to the rank-1 case. If anyone has any suggestions for how to do this I would like to know.
There are a few places in the code where a real number needs to be multiplied or divided by 2.0**i for some integer power i. I use the scale() intrinsic to do that operation. That operation just reaches into the floating point number and directly modifies the exponent. This means that there is no chance that a floating point roundoff error wil occur that would result in, for example, a return value x >=1.0 or in a subnormal value. I tested this on my computer (arm64 hardware), and the floating point operation actually never rounds up incorrectly. So it appears that I’m protecting the result from something that never actually occurs, at least on this hardware. The scale() intrinsic is probably a little more efficient than a floating point division or multiplication, so maybe that is still a good reason to use it rather than the mathematically equivalent floating point operation.
I went ahead and added random_init() and random_seed() routines. This is just to flesh out the full functionality. However, I have not yet implemented the image_distinct option. I know how to do the jump() operation for this, but I want to have a version that works for OpenMP threads, for coarray images, and for the combination of both when they are used together. I need to investigate this further. So at this point that is a feature that is not yet implemented.
I had a minor mistake in the previous random_seed() code. I wrote the code assuming that the default integer was int32. In the current version that code uses default integers. I cannot test it easily, but if default integers are int64, then I think this new version will work correctly.
Here is another version of the random number routines based on the geometric distribution of the exponent field. There are several significant features of this code. The most important is probably that there is a pure interface to the random number routines. This allows their use within DO CONCURRENT or when thread-safe random numbers are required. Here is the source code.
prng.F90 (53.5 KB)
This file requires the preprocessor to be activated during compilation because there is a main program within an #ifdef block and because the documentation at the end is within an #if block. If you compile normally, then only the module is compiled. If you compile with the macro -DEXE, then both the module and the main program are compiled. This compiles with gfortran, flang, and nagfor on MacOS, and it produces the same numerical output (but with different conventions for leading zeros and exponent field widths in the printed output, which is typical for fortran). Here is the gfortran output:
$ gfortran -g -DEXE prng.F90 && a.out
initial get(:)= B84A473D 3475445D 2637426F ED265798 37993BAD D97B34F0 BE00171A 014331A7
ok? T x0= 0.7645 x0_tbp= 0.7645
ok? T gd x0= 0.8954 x0_tbp= 0.8954
ok? T div x0= 0.4416E-1 x0_tbp= 0.4416E-1
ok? T x1= 0.7923 0.2829 0.8032 0.3816 x1_tbp= 0.7923 0.2829 0.8032 0.3816
ok? T gd x1= 0.5215 0.2601 0.4379 0.4210 x1_tbp= 0.5215 0.2601 0.4379 0.4210
ok? T div x1= 0.9141 0.9046E-2 0.6325 0.4620 x1_tbp= 0.9141 0.9046E-2 0.6325 0.4620
ok? T x2= 0.7962 0.9867 0.6330 0.4775 x2_tbp= 0.7962 0.9867 0.6330 0.4775
ok? T gd x2= 0.2580 0.4058 0.9012 0.1569 x2_tbp= 0.2580 0.4058 0.9012 0.1569
ok? T div x2= 0.2839 0.2311 0.9174 0.3249 x2_tbp= 0.2839 0.2311 0.9174 0.3249
ok? T x3= 0.8761 0.9089 0.7705 0.8789E-1 0.9325 0.4972 0.5393 0.2307 x3_tbp= 0.8761 0.9089 0.7705 0.8789E-1 0.9325 0.4972 0.5393 0.2307
ok? T gd x3= 0.6187 0.7780 0.9455E-1 0.4856 0.5737 0.7077 0.3310E-1 0.7003 x3_tbp= 0.6187 0.7780 0.9455E-1 0.4856 0.5737 0.7077 0.3310E-1 0.7003
ok? T div x3= 0.5243 0.6880 0.1844 0.5763 0.9171 0.2819 0.9139 0.6151 x3_tbp= 0.5243 0.6880 0.1844 0.5763 0.9171 0.2819 0.9139 0.6151
ok? T d0= 0.1544 d0_tbp= 0.1544
ok? T gd d0= 0.5644 d0_tbp= 0.5644
ok? T div d0= 0.7569 d0_tbp= 0.7569
ok? T d1= 0.8102 0.9859 0.6725 0.7314 d1_tbp= 0.8102 0.9859 0.6725 0.7314
ok? T gd d1= 0.1637 0.9856 0.8454 0.2325 d1_tbp= 0.1637 0.9856 0.8454 0.2325
ok? T div d1= 0.5373 0.1259 0.1003 0.2616 d1_tbp= 0.5373 0.1259 0.1003 0.2616
ok? T d2= 0.7143 0.3411 0.1408 0.1694 d2_tbp= 0.7143 0.3411 0.1408 0.1694
ok? T gd d2= 0.1804 0.7281 0.6500 0.3301 d2_tbp= 0.1804 0.7281 0.6500 0.3301
ok? T div d2= 0.9881E-1 0.1211 0.8388 0.1226E-1 d2_tbp= 0.9881E-1 0.1211 0.8388 0.1226E-1
ok? T d3= 0.9948 0.2928 0.6126 0.2873 0.7160E-1 0.7461 0.3804 0.2328 d3_tbp= 0.9948 0.2928 0.6126 0.2873 0.7160E-1 0.7461 0.3804 0.2328
ok? T gd d3= 0.6317 0.3442E-3 0.4088 0.8275 0.7267 0.5080 0.5302 0.7926 d3_tbp= 0.6317 0.3442E-3 0.4088 0.8275 0.7267 0.5080 0.5302 0.7926
ok? T div d3= 0.6515E-1 0.1572 0.9950 0.7377 0.6703 0.6082 0.4245 0.2991 d3_tbp= 0.6515E-1 0.1572 0.9950 0.7377 0.6703 0.6082 0.4245 0.2991
ok? T i8_0= 115 i8_0_tbp= 115
ok? T i8_1= -106 -45 121 89 i8_1_tbp= -106 -45 121 89
ok? T i8_2= 19 -99 -103 -47 i8_2_tbp= 19 -99 -103 -47
ok? T i16_0= 5431 i16_0_tbp= 5431
ok? T i16_1= -13724 25277 16424 2835 i16_1_tbp= -13724 25277 16424 2835
ok? T i16_2= 28423 -2959 -7373 -5789 i16_2_tbp= 28423 -2959 -7373 -5789
ok? T i32_0= -536353450 i32_0_tbp= -536353450
ok? T i32_1= -1722477179 -1167282628 378125651 31215344 i32_1_tbp= -1722477179 -1167282628 378125651 31215344
ok? T i32_2= -22505619 111992582 -1311688845 1551405022 i32_2_tbp= -22505619 111992582 -1311688845 1551405022
ok? T i64_0= 2686342181531513020 i64_0_tbp= 2686342181531513020
ok? T i64_1= -1976433976715876287 -3737591198842903069 448044774846790590 -8222087712594927580 i64_1_tbp= -1976433976715876287 -3737591198842903069 448044774846790590 -8222087712594927580
ok? T i64_2= 6897419053096477350 -4289971558446886110 -3436230029862249510 -2372320616395388279 i64_2_tbp= 6897419053096477350 -4289971558446886110 -3436230029862249510 -2372320616395388279
array( 1) get(:)= B84A473D 3475445D 2637426F ED265798 37993BAD D97B34F0 BE00171A 014331A7
array( 2) get(:)= C541C88C 923A55F2 45788A86 794EBDEC 333359ED 943143EF F1986066 ED816C02
array( 3) get(:)= 7E01BAFC AF0DA3F0 EA4A7377 7F2AA2C8 B0A81706 4AA3EEF2 8DA90EBF 8BCEBC9B
array( 4) get(:)= 92678EBD 016B71AD 35DB9406 B0453592 70E96DC4 D6BFF67D 7FD08520 19F70647
array( 5) get(:)= D257FF82 747BD361 B93D29F5 B3B3E6A9 8093B501 AB262746 7893543E 543F0609
array( 6) get(:)= 7807FC86 F70EBDC2 C1D5A5A4 F45F1BE6 F3ED3790 3FB58817 332C1EA1 10E17E37
array( 7) get(:)= 727FAA3A C44EB7BA 0855D3B1 09777082 B96AABDE 05B2B3BF 776A0B33 CE12174D
array( 8) get(:)= 75A4A97C 8D39994E 6A7B163F AFC633EF 8D905E4B F00D59D0 3C367F5F EAF108CA
array( 9) get(:)= 0205A19D D5436DE4 23386DBC 18864023 F49415EC 132C1807 0C37F3DC 85BC9A6F
array(10) get(:)= 437F9019 F977D84A 363EEE99 16F12A8E 798FD80D 78B509EB 851562A2 FF65A02C
array(11) get(:)= 5D21A58E FC942584 A4AAD34A 1D8CD287 F008E709 02A327D8 30D6BC7A 065C28BC
array(12) get(:)= 1D7186E8 08D969F5 989E572F 48B7180F 0DF00491 C3888DED A833AD9B EFDCB0A3
array(13) get(:)= B29E1BE7 26F0A926 609E8F32 30146385 17A802F7 69559522 CC7F0D56 B2217C19
array(14) get(:)= FFCFAC5F 71325788 5B44989B 2D9EBC3E BB54A68B 7F813177 96607DA1 F93B1B40
array(15) get(:)= 722CCAD5 E09B5C28 281116AE BA95511F DE71071C AC699CA4 6BE8386F 82BECB6F
array(16) get(:)= F66366EF 98E3A4C5 195020B5 31A5E88A AE9C5949 FBCE3A8D 0CF54338 A7ACBF90
traditional (impure) interface statistics generated with repeatable_mode= T
initial random_seed get(:)=
10111000010010100100011100111101
00110100011101010100010001011101
00100110001101110100001001101111
11101101001001100101011110011000
00110111100110010011101110101101
11011001011110110011010011110000
10111110000000000001011100011010
00000001010000110011000110100111
gd real32 imax= 1073741824 mean= 0.499994935 xmin= 0.664625910E-9 xmax= 0.999999940 emin= -30 emax= 0 minexponent= -125
gd real64 imax= 1073741824 mean= 0.499997418 xmin= 0.362035592E-9 xmax= 0.999999997 emin= -31 emax= 0 minexponent= -1021
div real32 imax= 1073741824 mean= 0.500011562 xmin= 0.00000000 xmax= 0.999999940 emin= -126 emax= 0 minexponent= -125
div real64 imax= 1073741824 mean= 0.500017987 xmin= 0.303509773E-9 xmax= 0.999999999 emin= -31 emax= 0 minexponent= -1021
Exponent Histrogram for imax = 1073741824
spacing(0.75_real32)= 0.596046448E-7 exponent= -23
spacing(0.75_real64)= 0.111022302E-15 exponent= -52
e expected gd32(e) div32(e) gd64(e) div64(e)
0 536870912.00 536853007 536878872 536858075 536904346
-1 268435456.00 268451632 268443589 268446286 268419497
-2 134217728.00 134223590 134215870 134228297 134212992
-3 67108864.00 67100323 67101456 67103234 67101977
-4 33554432.00 33552995 33553915 33552159 33548011
-5 16777216.00 16781246 16773734 16776807 16776347
-6 8388608.00 8385483 8389682 8390394 8390244
-7 4194304.00 4199714 4193517 4190955 4193787
-8 2097152.00 2097753 2094901 2096007 2098863
-9 1048576.00 1047284 1049170 1048710 1048268
-10 524288.00 523876 523326 525056 523871
-11 262144.00 262229 262408 262738 261787
-12 131072.00 131373 130836 131357 130791
-13 65536.00 65682 65204 65888 65777
-14 32768.00 32829 32929 33049 32380
-15 16384.00 16342 16181 16292 16421
-16 8192.00 8215 8165 8277 8196
-17 4096.00 4117 4008 4129 4124
-18 2048.00 2102 2078 1998 2107
-19 1024.00 1028 976 1050 1053
-20 512.00 511 497 532 519
-21 256.00 261 281 271 220
-22 128.00 105 116 135 122
-23 64.00 63 57 55 63
-24 32.00 30 0 33 26
-25 16.00 13 0 22 16
-26 8.00 12 0 11 10
-27 4.00 4 0 3 4
-28 2.00 2 0 2 3
-29 1.00 2 0 1 1
-30 0.50 1 0 0 0
-31 0.25 0 0 1 1
-126 0.00 0 56 0 0
I would appreciate any constructive comments and criticisms about the code and any corrections of errors.
This implementation is NOT safe for use within DO CONCURRENT, nor thread-safe. It modifies a shared state. I do not know of a random number generator which is thread-safe or valid to use within DO CONCURRENT. Specifically from 11.1.7.5 of the 2023 standard:
If it [a variable with shared locality] is defined or becomes undefined during any iteration, it shall not be referenced, defined, or become undefined during any other iteration.
Please check again. As described in more detail in the documentation, they are pure subroutines and no global data is modified, only the thread-local or image-local data is modified when run in parallel. However, it would be very good if someone with more OpenMP and coarray experience could verify that they work as intended.