I had prepared some time ago a comparison of loops vs array syntax in the context of a dynamic programming problem, see FortranVec/src/main.f90 at main · aledinola/FortranVec · GitHub
The loop-based code is in bellman_op, the code with array syntax is in bellman_op_vec and bellman_op_vec2. There was a discussion on this forum, see Performance of vectorized code in ifort and ifx
It turns out that this code
v_max = large_negative
ap_ind = 0
! Choose a' optimally by stepping through all possible values
do ap_c=1,n_a
aprime_val = ap_grid(ap_c)
cons = R*a_val + z_val - aprime_val
if (cons>0.0d0) then
v_temp = f_util(cons) + beta*EV(ap_c,z_c)
!v_temp = f_util(cons) + beta*sum(v(ap_c,:)*z_tran(z_c,:))
if (v_temp>v_max) then
v_max = v_temp
ap_ind = ap_c
end if
endif
enddo !end a'
is faster than this
cons = R*a_val + z_val - ap_grid ! (n_ap,1)
! NOTE: where and merge are slower than forall
! NOTE: forall and do concurrent are equivalent with ifort
! but do concurrent is very slow with ifx!
!where (cons>0.0d0)
! util = f_util(cons) ! (n_ap,1)
!elsewhere
! util = large_negative
!end where
!util = merge(f_util(cons),large_negative,cons>0.0d0)
! v_temp = large_negative
! do concurrent (ap_c=1:n_a, cons(ap_c)>0.0d0)
! v_temp(ap_c) = f_util(cons(ap_c))+beta*EV(ap_c,z_c)
! enddo
v_temp = large_negative
forall (ap_c=1:n_a, cons(ap_c)>0.0d0)
v_temp(ap_c) = f_util(cons(ap_c))+beta*EV(ap_c,z_c)
end forall
ap_ind = maxloc(v_temp,dim=1)
Please see my repo for more information (in the second block of code you can replace the forall with do concurrent or merge or where if you don’t like forall being obsolete)
Back then (in 2024) we realized that there were also interesting performance differences between ifort and ifx (thanks to @ivanpribec for measuring running times appropriately) with ifort being significantly faster. It would be interesting to run again this test to see if ifx has caught up.