DO CONCURRENT: compiler flags to enable parallelization

aledinola · November 18, 2025, 3:44pm

I had prepared some time ago a comparison of loops vs array syntax in the context of a dynamic programming problem, see FortranVec/src/main.f90 at main · aledinola/FortranVec · GitHub

The loop-based code is in bellman_op, the code with array syntax is in bellman_op_vec and bellman_op_vec2. There was a discussion on this forum, see Performance of vectorized code in ifort and ifx

It turns out that this code

v_max = large_negative
            ap_ind = 0
            ! Choose a' optimally by stepping through all possible values
            do ap_c=1,n_a
                aprime_val = ap_grid(ap_c)
                cons = R*a_val + z_val - aprime_val
                if (cons>0.0d0) then
                    v_temp = f_util(cons) + beta*EV(ap_c,z_c)
                    !v_temp = f_util(cons) + beta*sum(v(ap_c,:)*z_tran(z_c,:))
                    if (v_temp>v_max) then
                        v_max = v_temp
                        ap_ind = ap_c
                    end if
                endif
            enddo !end a'

is faster than this

cons = R*a_val + z_val - ap_grid ! (n_ap,1)
            ! NOTE: where and merge are slower than forall
            ! NOTE: forall and do concurrent are equivalent with ifort
            ! but do concurrent is very slow with ifx!
            !where (cons>0.0d0)
            !    util = f_util(cons)  ! (n_ap,1)
            !elsewhere
            !    util = large_negative
            !end where
            !util = merge(f_util(cons),large_negative,cons>0.0d0)
            ! v_temp = large_negative
            ! do concurrent (ap_c=1:n_a, cons(ap_c)>0.0d0)
            !    v_temp(ap_c) = f_util(cons(ap_c))+beta*EV(ap_c,z_c)
            ! enddo
            v_temp = large_negative
            forall (ap_c=1:n_a, cons(ap_c)>0.0d0)
               v_temp(ap_c) = f_util(cons(ap_c))+beta*EV(ap_c,z_c)
            end forall
            ap_ind = maxloc(v_temp,dim=1)

Please see my repo for more information (in the second block of code you can replace the forall with do concurrent or merge or where if you don’t like forall being obsolete)

Back then (in 2024) we realized that there were also interesting performance differences between ifort and ifx (thanks to @ivanpribec for measuring running times appropriately) with ifort being significantly faster. It would be interesting to run again this test to see if ifx has caught up.

Topic		Replies	Views
Nvfortran comparison of do concurrent vs OpenMP code Help	24	935	September 9, 2024
Do concurrent: not seeing any speedup	39	928	June 2, 2025
OpenMP and `do concurrent` loop = crash at runtime Help	14	1309	September 24, 2025
Asynchronous GPU programming with Fortran Help	2	381	September 21, 2025
DO CONCURRENT question	15	2288	July 30, 2025

DO CONCURRENT: compiler flags to enable parallelization

Related topics