@PierU
I am getting a different relative performance between automatic and allocatable arrays on Windows 10 and Gfortran 14.2
My modified code is
module loc_tmp
integer*8 :: loc_auto, loc_alloc
end module loc_tmp
program ap
use iso_fortran_env
use omp_lib
use loc_tmp
implicit none
integer :: n , m, i, k, lu
real :: s, au(5)
real, allocatable :: al(:)
double precision :: tic, toc
write (*,*) compiler_version ()
write (*,*) compiler_options ()
!z open(newunit=lu,file="/dev/null")
open (newunit=lu,file="dev_null.txt")
do k = 0, 30
n = 2**k
m = 2**(30-k)
write(*,"(A,I2,A,I2)",advance='no') "2**", 30-k, " allocations of array size = 2**", k
s = 0.0
tic = my_get_wtime()
do i = 1, m
s = s + with_auto_array(n)
end do
toc = my_get_wtime()
write(*,"(A,F8.4,A)",advance='no') " Automatic: ", toc-tic, " sec."
write(lu,*) n, s, toc-tic, loc_auto
s = 0.0
tic = my_get_wtime()
do i = 1, m
s = s + with_alloc_array(n)
end do
toc = my_get_wtime()
write(*,"(A,F8.4,A)",advance='no') " Allocatable: ", toc-tic, " sec."
write(lu,*) n, s, toc-tic, loc_alloc
write(*,*)
end do
allocate ( al(5))
write (lu,*) loc(s), loc(au), loc(al)
contains
real function with_auto_array(n) result(r)
integer, intent(inout) :: n
real :: tmp(n)
tmp(:) = 1.0
loc_auto = loc(tmp)
r = tmp(n/2)
end function
real function with_alloc_array(n) result(r)
integer, intent(in) :: n
real, allocatable :: tmp(:)
allocate( tmp(n) )
tmp(:) = 1.0
loc_alloc = loc(tmp)
r = tmp(n/2)
end function
real*8 function my_get_wtime ()
integer*8 :: clock, rate
call system_clock ( clock, rate )
my_get_wtime = dble (clock) / dble (rate)
end function my_get_wtime
end
The reported results saved are :
GCC version 14.2.0
-cpp -iprefix C:/Program Files (x86)/gcc_eq/gcc_14.2.0/bin/../lib/gcc/x86_64-w64-mingw32/14.2.0/ -U_REENTRANT -mtune=generic -march=x86-64 -fno-underscoring -fdollar-ok
2**30 allocations of array size = 2** 0 Automatic: 33.9688 sec. Allocatable: 37.5021 sec.
2**29 allocations of array size = 2** 1 Automatic: 17.2286 sec. Allocatable: 18.8676 sec.
2**28 allocations of array size = 2** 2 Automatic: 8.8427 sec. Allocatable: 9.5493 sec.
2**27 allocations of array size = 2** 3 Automatic: 4.6037 sec. Allocatable: 4.8974 sec.
2**26 allocations of array size = 2** 4 Automatic: 2.4854 sec. Allocatable: 2.5678 sec.
2**25 allocations of array size = 2** 5 Automatic: 1.3920 sec. Allocatable: 1.5294 sec.
2**24 allocations of array size = 2** 6 Automatic: 0.8679 sec. Allocatable: 0.9873 sec.
2**23 allocations of array size = 2** 7 Automatic: 0.6288 sec. Allocatable: 0.7521 sec.
2**22 allocations of array size = 2** 8 Automatic: 0.5132 sec. Allocatable: 0.6007 sec.
2**21 allocations of array size = 2** 9 Automatic: 0.4020 sec. Allocatable: 0.5218 sec.
2**20 allocations of array size = 2**10 Automatic: 0.3779 sec. Allocatable: 0.4836 sec.
2**19 allocations of array size = 2**11 Automatic: 0.3513 sec. Allocatable: 0.4638 sec.
2**18 allocations of array size = 2**12 Automatic: 0.3502 sec. Allocatable: 0.4545 sec.
2**17 allocations of array size = 2**13 Automatic: 0.3418 sec. Allocatable: 0.4591 sec.
2**16 allocations of array size = 2**14 Automatic: 0.3487 sec. Allocatable: 0.4519 sec.
2**15 allocations of array size = 2**15 Automatic: 0.3481 sec. Allocatable: 0.4492 sec.
2**14 allocations of array size = 2**16 Automatic: 0.3439 sec. Allocatable: 0.4478 sec.
2**13 allocations of array size = 2**17 Automatic: 0.3455 sec. Allocatable: 0.4470 sec.
2**12 allocations of array size = 2**18 Automatic: 0.9597 sec. Allocatable: 1.0474 sec.
2**11 allocations of array size = 2**19 Automatic: 0.8920 sec. Allocatable: 0.9780 sec.
2**10 allocations of array size = 2**20 Automatic: 0.8650 sec. Allocatable: 0.9525 sec.
2** 9 allocations of array size = 2**21 Automatic: 0.8597 sec. Allocatable: 0.9464 sec.
2** 8 allocations of array size = 2**22 Automatic: 0.8680 sec. Allocatable: 0.9352 sec.
2** 7 allocations of array size = 2**23 Automatic: 0.8531 sec. Allocatable: 0.9442 sec.
2** 6 allocations of array size = 2**24 Automatic: 0.8267 sec. Allocatable: 0.9371 sec.
2** 5 allocations of array size = 2**25 Automatic: 0.8154 sec. Allocatable: 0.9656 sec.
2** 4 allocations of array size = 2**26 Automatic: 0.8510 sec. Allocatable: 0.9558 sec.
2** 3 allocations of array size = 2**27 Automatic: 0.8623 sec. Allocatable: 0.9479 sec.
2** 2 allocations of array size = 2**28 Automatic: 0.8210 sec. Allocatable: 0.9345 sec.
2** 1 allocations of array size = 2**29 Automatic: 0.8515 sec. Allocatable: 0.9458 sec.
2** 0 allocations of array size = 2**30 Automatic: 0.7930 sec. Allocatable: 0.9772 sec.
Note:
I have no idea what stack size I am getting, but the compiler options I used are being reported.
tic =omp_get_wtime() is an awful implementation on windows
2^30 allocations of array size = 2** 0 produces a lot of errors reporting r = tmp(n/2)
2^30 Automatic should be a 4-gbyte array on the stack ?
Memory address report suggests the automatic arrays are on the heap for my Wincows / gfortran 14.2
Using allocatable arrays is a more robust approach for large arrays.
I only use automatic arrays for “small” problems