Memory bandwidth test code for M1 and PC

CRquantum · April 8, 2022, 8:40am

Dear all,

I just wanted to test a code, to see if there are speed dfference between M1 and PC.
I have a code, I have to say, this definitely may not be the best code to test memory bandwidth, but I wanted to share the test code with you. You may test and report the result, especially if you have M1, M1 Pro, M1 Max, or even M1 Ultra.

The code is basically solving a simple 2-equation stochastic differential equation (SDE). Because it is stochastic differential equation, you can imagine you need to solve it many many times, in order to find the correct statistics. if ‘many many’ = 10^5, it basically means SDE will be 10^5 slower than ODE.

OK. The code is simplified from a sequential Monte Carlo code, so some modules are left as blank for simplicity. There are 8 files, and a Makefile.
All you need to do is just copy them inside one folder, them type

make

then type

./stoRK

to run the code. That is all.

The 8 files are below,
constants.f90 (755 Bytes)
fg.f90 (818 Bytes)
main.f90 (814 Bytes)
pf.f90 (31 Bytes)
ran.f90 (9.2 KB)
stats.f90 (3.6 KB)
stochastic_rk.f90 (4.7 KB)
tests.f90 (2.2 KB)

The Makefile is (there is no MPI in this code) below, just type make and that is all.

# This Makefile was generated by Rong Chen for gfortran on ubuntu
# sudo apt install gfortran mpich    https://www.youtube.com/watch?v=aRhYoAC-Ymc

MPI=false

ifeq ($(MPI),true)
FC = mpif90
MPIFILE=mympi
EXEC = stork_mpi
LINKER =

IDIR =
FFLAGS = -O3 -march=native -frecursive # -flto
#-fcheck=all -Og -g -fbacktrace -Wall -Wextra -Wno-tabs -Wno-unused-dummy-argument -Wno-unused-variable -Wno-unused-function -Wno-compare-reals -Wno-maybe-uninitialized -Wno-conversion -ffpe-trap=invalid,zero,overflow -finit-real=nan  -ffree-line-length-0# -Ofast -march=native -flto # -pg  #-no-pie 
F77FLAGS = $(FFLAGS) -std=legacy -fdefault-real-8 -fdefault-double-8 # gfortran only.
FFLAGS_heapstack = -frecursive # -fmax-stack-var-size=655360
LDFLAGS=
LIBS = -static-libgfortran
LINKER =

else
FC = gfortran
EXEC = stork
LINKER =

IDIR =
FFLAGS = -O3 -march=native -frecursive -flto
#-fcheck=all -Og -g -fbacktrace -Wall -Wextra -Wno-tabs -Wno-unused-dummy-argument -Wno-unused-variable -Wno-unused-function -Wno-compare-reals -Wno-maybe-uninitialized -Wno-conversion -ffpe-trap=invalid,zero,overflow -finit-real=nan  -ffree-line-length-0
#-g -Wall -Wtabs -Wextra -Warray-temporaries -Wconversion -fbacktrace -ffree-line-length-0 -fcheck=all -ffpe-trap=invalid,zero,overflow -finit-real=nan 
# -pg  #-no-pie # -fmax-stack-var-size=655360  
F77FLAGS = $(FFLAGS) -std=legacy -fdefault-real-8 -fdefault-double-8 # gfortran only.
FFLAGS_heapstack = -frecursive # -fmax-stack-var-size=655360
LDFLAGS = 
LIBS= -static-libgfortran
endif


.SUFFIXES:
.SUFFIXES: .o .f .f90

.f90.o:
	$(FC) $(FFLAGS) -c $<

%.o: %.mod

OBJECTS=\
main.o\
constants.o\
fg.o\
pf.o\
ran.o\
stats.o\
stochastic_rk.o\
tests.o



main: $(OBJECTS)
	$(FC) $(LDFLAGS) -o ./$(EXEC) $(OBJECTS) $(LIBS) 2>compiling_record.log

clean:
	rm -f $(EXEC) *\.mod *\.mod0 *\.smod *\.smod0 *\.log *\.o *~
#	@del /q /f $(EXEC) *.mod *.smod *.o $(EXEC) *~ > nul 2> nul
# not that in windows rm -f does not work, so use del instead.
# > nul 2> nul just to suppress some redundunt mesage.


main.o: constants.o ran.o fg.o pf.o stochastic_rk.o tests.o stats.o main.f90
	$(FC) $(FFLAGS) -c main.f90
	
constants.o: constants.f90
	$(FC) $(FFLAGS) -c constants.f90
	
fg.o: constants.o fg.f90
	$(FC) $(FFLAGS)  -c fg.f90

pf.o: constants.o stats.o ran.o stochastic_rk.o fg.o pf.f90
	$(FC) $(FFLAGS)  -c pf.f90

ran.o: ran.f90
	$(FC) $(FFLAGS) -c ran.f90
	
stats.o: constants.o stats.f90
	$(FC) $(FFLAGS) -c stats.f90	
	
stochastic_rk.o: constants.o ran.o fg.o stochastic_rk.f90
	$(FC) $(FFLAGS) -c stochastic_rk.f90

tests.o: constants.o ran.o fg.o stochastic_rk.o stats.o tests.f90
	$(FC) $(FFLAGS) -c tests.f90

My hardware are

Thinkpad P72 with Xeon-2186M + 64 GB ECC DDR4 2666 (with 2TB + 2TB + 8TB storage and Quadro P5200 GPU just to show off lol). Windows 10 pro workstation. Intel OneAPI 2022.0.3 + Visual Studio 2019.

The ifort compile flag in windows 10 is,

/nologo /debug:full /MP /O3 /QxHost /Qparallel /heap-arrays0 /Qopt-matmul /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc160.pdb" /traceback /libs:static /threads /Qmkl:cluster /c

link flag is

/OUT:"x64\Release\stochastic_RK.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\stochastic_RK.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"D:\Works CHLA\stochastic_RK\stochastic_RK\stochastic_RK\x64\Release\stochastic_RK.pdb" /SUBSYSTEM:CONSOLE /LARGEADDRESSAWARE /IMPLIB:"D:\Works CHLA\stochastic_RK\stochastic_RK\stochastic_RK\x64\Release\stochastic_RK.lib"

M1 Macbook Air with 16B RAM and 2TB storage. Latest OS, and the latest gfortran in brew, version 11.2.0_3.

The code has two parts,

PART 1. It generate a 800000000 element gaussian random number array, which will be used in PART 2. This part cost like 10GB memory or so. So you may need 16GB memory.
In this part, M1 chip does not have advantage over my Xeon-2186M, it is almost 3X slower than my Xeon. 24s vs 9s.
However, it is also possible that my ifort compiled with /qparallel flag and MKL, so that perhaps also cause the difference. Anyway.

PART 2. The second part is to repeat solving the same SDE for 5 times.
The first time M1 cost 3s, then (strangely) the next 4 loops, each cost 1.6s.
Now, my Xeon-2186M, all the 5 times, each cost like 3.5s.
So it seems M1 overall, in this part, is 2X faster than my Xeon.

Finally it show the total time the code cost.

Welcome to test, and if you have any suggestion with regard to speed up the code, please feel free to let me know as well.

Thank you very much indeed in advance!

PS.

The M1 result is

 start generating big Gaussian random number array, it make take 5 - 20 seconds ...
size of random array =            800000000
random number generating took n seconds, n =    24.14900    
 i is n out of 5, n =                     1
 
STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           3.005000     sec
step size :        1/ 1000
Np:                      100000
 
Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570    
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576    
 
Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052    
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233    
 ------------------------------------ 
 i is n out of 5, n =                     2
 
STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           1.689000     sec
step size :        1/ 1000
Np:                      100000
 
Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570    
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576    
 
Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052    
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233    
 ------------------------------------ 
 i is n out of 5, n =                     3
 
STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           1.599000     sec
step size :        1/ 1000
Np:                      100000
 
Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570    
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576    
 
Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052    
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233    
 ------------------------------------ 
 i is n out of 5, n =                     4
 
STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           1.610000     sec
step size :        1/ 1000
Np:                      100000
 
Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570    
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576    
 
Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052    
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233    
 ------------------------------------ 
 i is n out of 5, n =                     5
 
STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           1.606000     sec
step size :        1/ 1000
Np:                      100000
 
Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570    
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576    
 
Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052    
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233    
 ------------------------------------ 
total time cost =    33.76200      seconds
STOP Program end normally.

Xeon 2186 result is,

 start generating big Gaussian random number array, it make take 5 - 20 seconds
 ...
size of random array =            800000000
random number generating took n seconds, n =    9.563000
 i is n out of 5, n =                      1

STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           4.104000     sec
step size :        1/ 1000
Np:                      100000

Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576

Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233
 ------------------------------------
 i is n out of 5, n =                      2

STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           3.786000     sec
step size :        1/ 1000
Np:                      100000

Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576

Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233
 ------------------------------------
 i is n out of 5, n =                      3

STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           4.156000     sec
step size :        1/ 1000
Np:                      100000

Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576

Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233
 ------------------------------------
 i is n out of 5, n =                      4

STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           3.776000     sec
step size :        1/ 1000
Np:                      100000

Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576

Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233
 ------------------------------------
 i is n out of 5, n =                      5

STOCHASTIC_RK_TEST 03 ------------------------------------
Time cost =           3.529000     sec
step size :        1/ 1000
Np:                      100000

Simulated Mean at t=0.2:1.0:0.2 is:                16.374       13.407       10.976       8.9851       7.3570
Theoretical Mean at t=0.2:1.0:0.2 is:              16.375       13.406       10.976       8.9866       7.3576

Simulated Variance at t=0.2:1.0:0.2 is:           0.16495      0.27571      0.34949      0.39851      0.43052
Theoretical Variance at t=0.2:1.0:0.2 is:         0.16484      0.27534      0.34940      0.39905      0.43233
 ------------------------------------
total time cost =    29.53600      seconds
Program end normally.

The most time consuming part is the below subroutine, whichis basically the vector version of John Burkardt’s stochasticRK code with some optimizations, https://people.math.sc.edu/Burkardt/f_src/stochastic_rk/stochastic_rk.f90

  subroutine rk4_ti_fullvec_test ( x0, np, nstep, q, h, fi_gi_in, nd, x )
  ! https://stackoverflow.com/questions/69147944/is-there-room-to-further-optimize-the-stochastic-rk-fortran-90-code 
  ! https://stackoverflow.com/questions/32809769/how-to-pass-subroutine-names-as-arguments-in-fortran
  use random
  use fg
  implicit none 
  integer(kind = i8), intent(in) :: np
  integer(kind = i8), intent(in) :: nstep
  integer(kind = i8), intent(in) :: nd
  procedure(fi_gi_fullvec_03) :: fi_gi_in
  real(kind = r8), intent(in) :: q,h
  real(kind = r8), intent(in) :: x0(np,nd)
  real(kind = r8), intent(out) :: x(np,nd,0:nstep)
  real(kind = r8) :: ks(np,nd,4),ks_matmul(np,4,nd)
  real(kind = r8) :: xs(np,nd,4)
  real(kind = r8) :: warray(np,nd,4)
  integer(kind = i8) :: i,j,k,l,m,n 
  real(kind = r8) :: xstar(np,nd)
  real( kind = r8 ) :: f(np,nd), g(np,nd)
  real(kind = r8) :: sigma(4)  

  sigma=sqrt(qs*q/h)  
  x(:,:,0) = x0	
  do k = 1, nstep
    xstar = x(:,:,k-1)
    do j = 1,4
      do concurrent (l=1:nd)
      !do l = 1,nd 
        !write(6,*) 'k j l = ', k,j,l   
        xs(:,l,j) = x(:,l,k-1) + matmul(ks(:,l,:j-1),as(:j-1,j))           
        !write(6,*) 'success k j l = ', k,j,l
        
        !if (j>1) then
        !  xs(:,l,j) = x(:,l,k-1) + matmul(ks(:,l,:j-1),as(:j-1,j)) 
        !else
        !  xs(:,l,j) = x(:,l,k-1) 
        !endif
        

        !xs(:,l,j) = x(:,l,k-1) + matmul(ks_matmul(:,:j-1,l),as(:j-1,j))
        
        
        !do concurrent (n=1:np)
!!$OMP parallel do
!        do n=1,np
!          xs(n,l,j) = x(n,l,k-1) + dot_product(ks(n,l,:j-1),as(:j-1,j))  
!        enddo
!!$OMP end parallel do
        !do n = 1,np 
        !  warray(n,l,j) = rnor()*sigma(j)   !gaussian(2)*sqrt(qs(j)*qoh)                      
        !enddo
      enddo  

 
      call fi_gi_in( np, nd, xs(:,:,j), x0(1,:), f, g ) 
      !ks(:,:,j) = h * ( f + g*warray(:,:,j) )  
      
      ks(:,:,j) = h * ( f + g*normal(:,:,j,k)*sigma(j) ) 
      xstar = xstar + alphas(j)*ks(:,:,j)   
      
     
      !do concurrent (l=1:nd,n=1:np)
!      do l=1,nd
!!$OMP parallel do 
!        do n=1,np
!        ks(n,l,j) = h * ( f(n,l) + g(n,l)*normal(n,l,j,k)*sigma(j) )
!        xstar(n,l) = xstar(n,l) + alphas(j)*ks(n,l,j) 
!        ks_matmul(n,j,l) = ks(n,l,j)
!        enddo
!!$OMP end parallel do
!      enddo
    enddo    
    x(:,:,k) = xstar   
  enddo
  return
  end subroutine rk4_ti_fullvec_test

mecej4 · April 8, 2022, 12:55pm

Since you have placed Subroutine Timestamp and Function Wtime inside a module in file stats.f90, you should terminate their END statements with " Subroutine" and " Function", respectively.

I have run the program and reported the results in your previous thread .

JohnCampbell · April 9, 2022, 1:43pm

I have been looking at some of the code linked above.

I find some of the module declarations very verbose, which I find unncessary and could cause errors, but that is just a style difference.

There are a few declarations that look surprising. All are trying to provide standard conforming KIND definitions, but do not quite get there.

There is repeated reference (including constants.f90) to the real*8 kind, which is kind(1.0d0) as
integer, private, parameter :: r8=selected_real_kind(15,9)
The decimal exponent range is strangely given as 9. I suspect this is a persistent “error” but clearly demonstrates that values of precision and range are not “user friendly” values.

In function uni in ran.f90: what is the kind of the constant in the following line ?
ran(i) = half + 0.2328306e-9_dp * shr3()
I suspect this is a real4 constant with an integer8 exponent “9”

In function ran1 in ran.f90: there is a mix of integer kinds that are not as general as intended. Perhaps 1, 24 and 48 should be named I*8 parameter constants.
integer(kind=i8), parameter :: mask24 = ishft(1_8,24)-1
integer(kind=i8), parameter :: mask48 = ishft(1_8,48_8)-1_8

64-bit integer constants are difficult to code in Fortran, especially since F90 when the integer value 12345678901 should be treated as a default integer kind.

In other codes I have reviewed from experienced Fortran users, I have also found examples of
integer, parameter :: sp=selected_real_kind(7,37)

I make these comments, as I often find the use of KIND to be more error prone than is claimed.

CRquantum · April 9, 2022, 6:16pm

Thank you very much for your comments @JohnCampbell ! I appreciate that!

uhm, like, what module declarations are verbose and may cause errors? I wanted to improve from there. Thanks!

Eh, what is the best and portable way to truly declare real 8, integer 4 and integer 8?

half is defined in module constants.f90, as real(kind=r8) where, I basically define,

integer, public, parameter :: i4=selected_int_kind(9)
integer, public, parameter :: i8=selected_int_kind(15)
integer, public, parameter :: r8=selected_real_kind(15,9)

But perhaps there are better and more universal way to define i4, i8, and r8. Like using those data type like int32, int64, real64, etc, defined in iso_fortran_env or something?

random number.
In the ran.f90, I basically no longer use subroutine ran1 and ran2 anymore. I basically added ziggurat.f90 in my ran.f90. About ziggurat.f90, I can found at least two versions.
One is from @Beliavsky,
https://github.com/Beliavsky/Ziggurat/blob/main/ziggurat.f90
One is from,
https://github.com/sandain/ecosim/blob/main/src/fortran/ziggurat.f90
The former usese select kind, the latter uses iso_fortran_env.

Yeah, some pieces of code are initially taken from John Burkardt, and from ziggurat.f90, so I kept using their notation of double precision like dp so kind=dp or something. I define double precision as r8 so kind=r8. I probably have change the values of them all as selected_real_kind(15,9). But perhaps there are better ways to represent true portable double precision.

JohnCampbell · April 10, 2022, 3:56am

@CRquantum, my post was more a comment and in no way a criticism.

My comment was about how Fortran declarations are becoming verbose, with a number of usages that are default or not required. I have rarely used public, private, target, save or dimension in a module.
With a “style”, if it helps you to understand the code, for later review, you should continue to use it.
With 0.2328306e-9_dp, what does the _dp refer to ? Is it 0.2328306 or is it -9

With “r8=selected_real_kind(15,9)”, why is there a ,9 ?
It is always a challenge as to what precision and exponent you should apply to a calculation/algorithm.
Or is it just what precision and exponent you should apply to get a 4-byte, 8-byte, 10-byte or 16-byte real !
I really liked reading code that had “integer, parameter :: sp=selected_real_kind(7,37)”, which clearly shows the failure of the approach.
Over many years, for all my decisions as to what computation precision is required, the decision has always been based on how many bytes of storage are needed vs what real types are available.
With old FORTRAN code, it was always a challenge to find documentation of how many bits, bytes or characters were assumed in the “real” variable. IEEE has removed this issue.

With mask24 and mask48, there is a mixture of integer kind arguments in ISHFT and also kind types, especially for “ishft(1_8,24)” which some compilers may reject.

There remains the problem in Fortran as to how to easily define an 8-byte integer constant as a routine argument. This is necessary for most 64-bit Fortran compilers, where the default integer does not support the memory address.
Some use say 2_8 ** 32, others 2_i8 ** 32, although I typically define/name 8-byte integer parameters using calculations, such as:

integer*8, parameter :: two = 2
integer*8, parameter :: one_gbyte = two**30  ! or 1024*1024*1024 or 2**30
integer*8, parameter :: two_gbyte = two**31  ! but NOT 2*1024*1024*1024 or 2**31
integer*8, parameter :: four_gbyte = two**32 ! or  two*two_gbyte

Fortran compilers have transitioned through a lot of default integer kinds, bytes and especially bit sizes. I doubt I will see when integer*16 will become hardware supported !!

CRquantum · April 10, 2022, 7:18am

Thanks @JohnCampbell , no problem at all, feel free to say anything.

I see, the 0.2328306e-9_dp, is actually 0.23*10^-9, and I added _dp to make it dp type which is double precision I believe. But perhaps I should just use (0.23*10**(-9))_dp, or just 0.2328306d-9?
it is in Beliavsky as well,

Uhm, it is just a convention I guess. My PhD advisor’s code use this r8=selected_real_kind(15,9) to specify real 8, so I keep using it and without really think of it. The 9 just means its exponent range of at least 9. But perhaps it is not good? Eh, what do you suggest to use for real 8?
I know in a code Beliavsky used

integer,  parameter       ::  dp=selected_real_kind(12, 60)

Is there a one-size-fits-all P and R for SELECTED_REAL_KIND(P,R) for real 4 and real 8?

the “ishft(1_8,24)” stuff I should probably add _i8, such as “ishft(1_i8,24_i8)”.

Thanks again.

themos · April 10, 2022, 8:48am

It refers to the real literal constant that precedes it.

mecej4 · April 10, 2022, 10:09am

I was also curious as to why you chose 9 as the second argument. The returned value from the intrinsic function would have been the same if you had specified any higher value up to 307 for X86/X64 platforms.

themos · April 10, 2022, 1:31pm

Memory bandwidth used to be measured by the STREAM benchmark. Is there a reason that is not good enough anymore?

mecej4 · April 10, 2022, 2:06pm

Dr. Bandwidth (John McCalpin) is now at the University of Texas, and the new URL for the Stream benchmark is John McCalpin's blog » Blog Archive » STREAM version 5.10 released .

CRquantum · April 10, 2022, 3:03pm

Thank you very much @mecej4 , @JohnCampbell for pointing out that.

Eh, is there drawback of using selected_real_kind(15,9)?
Or using selected_real_kind(15) is better than selected_real_kind(15,9)?

I am not Fortran language lawyer, lol, but there should be a reason for SELECTED_REAL_KIND([P, R, RADIX]) having three arugments P, R, and RADIX. Otherwise perhaps just one argument P should be enough.

We could discuss here,

wclodius · April 11, 2022, 3:56am

It refers to the representation of 0.0000000002328306. Of course with only seven digits of precision in the constant, and the small exponent, the use of a double precision representation of that constant would often be unnecessary and cost a little bandwidth.

JohnCampbell · April 11, 2022, 10:58am

@wclodius, thanks for the explanation.
I have not seen kind used to define real precision including an exponent before.
I am more familiar with e-9 or d-9.
This usage of kind for _qp or _16 would be needed when defining a 16-byte real constant.
0.1d0 or 0.1_dp is an important case where digits of precision in the constant can be misleading.

@CRquantum, in selected_real_kind(15,9), the resulting KIND must satisfy both p=15 and r=9, so the result is as you required. It gives the KIND you required.
However, “r=9” is an unusual exponent request in the context of an 8-byte real, which provides up to r=307 and p=15.

Topic		Replies	Views
How much GFLOPS usually did you get from the CPU peak performance?	1	1203	April 4, 2022
OpenMP stochastic simulation	31	401	September 19, 2024
M1 Mac for code development Help	8	5524	February 26, 2022
Does the speed (bandwidth) of memory matter?	11	1629	April 11, 2022
Nvfortran comparison of do concurrent vs OpenMP code Help	24	788	September 9, 2024

Memory bandwidth test code for M1 and PC

Related topics