OpenMP Parallel Loop

PierU · August 3, 2023, 12:50pm

inode MUST NOT be private

gfortran will handle that correctly (generally uses the heap instead of the stack for arrays above a given size). Otherwise you can indeed protect the jWork update with an atomic directive, or allocate your own private copies of jWork and operate a manual reduction at the end.

 !$OMP PARALLEL DO PRIVATE(kount,nod,jnode,nf2)
            do jelem=1,nelz ! ( size(nelz) >1 million))
                kount = count(lnods(jelem,1:lnomax) == inode)                            
                if(kount == 0)cycle     
                do nod=1,lnomax
                   jnode=lnods(jelem,nod)
                   nf2=(jnode-1)*ndfmax 
                   do idof = 1, ndfmax
                       !$OMP ATOMIC UPDATE
                       jWork(nf2+idof) = jWork(nf2+idof) + 1   
                   end do
                end do                   
             end do

            !$OMP PARALLEL PRIVATE(kount,nod,jnode,nf2,jWork_local)
            allocate( jWork_local(size(jWork)), source=0)
            !$OMP DO 
            do jelem=1,nelz ! ( size(nelz) >1 million))
                kount = count(lnods(jelem,1:lnomax) == inode)                            
                if(kount == 0)cycle     
                do nod=1,lnomax
                   jnode=lnods(jelem,nod)
                   nf2=(jnode-1)*ndfmax 
                   do idof = 1, ndfmax
                       jWork_local(nf2+idof) = jWork_local(nf2+idof) + 1   
                   end do
                end do                   
             end do
             !$OMP END DO
             !$OMP CRITICAL
             jWork(:) = jWork(:) + jWork_local(:)
             !$OMP END CRITICAL
             deallocate( jWork_local )
             !$OMP END PARALLEL

RonShepard · August 3, 2023, 3:30pm

For long vectors, there might be a significant difference in these two tests. The count() intrinsic must scan the entire vector, while the any() intrinsic can branch out when it finds the first value.

JohnCampbell · August 4, 2023, 2:05am

You need to investigate how to change the stack size for each thread. I use gfortran on windows and set the stack size to 500 MBytes, which then sets each thread stack size to 500 MBytes. (thread stack default can vary with different compilers or OS)

My build is:
set tce=load_gf.tce
set load=gfortran

set stack_options=-Wl,-stack,536870912,-Map=program.map

del program.map
del program.exe

now >>%tce%

%load% @load_gf.txt -fopenmp -fstack-arrays %stack_options% >>%tce%  2>&1

dir program.* /od >>%tce%

notepad %tce%

You can very easily confirm by using “write ( * , * ) loc(private_variable), loc(shared_variable)” in each thread.
A large stack only affects the virtual memory space, not physical memory space, so on 64 bit os, this approach should be the default !

I also expect that reducing such a large “jwork” array would be better avoided.

phangelder · August 4, 2023, 3:40am

Reduction of jwork is not a good option for me.

PierU · August 4, 2023, 8:08am

You may be right, but please state why it isn’t, given the clues that have been given to you to avoid the stack overflow issues.

By the way, we still don’t know if jnode can have the same value or not between different iterations of the jelem loop. If the answer is “no”, then no reduction or atomic directive is needed at all.

phangelder · August 4, 2023, 8:42am

I am using Intel OneAPI (2021) fortran and Visual studio 2017.
Haven’t figure out on the stack setting part yet.

PierU · August 4, 2023, 9:19am

As said before, you can avoid the stack issues with a manual allocation and reduction. I do not say that the reduction is the best option (probably the atomic option is better), just that it is a valid option.

And still :

phangelder · August 4, 2023, 9:26am

Jnode is having the same value for each iteration.

Topic		Replies	Views
Trying to make a program using OpenMP give same output as non parallel counterpart Help	2	483	May 9, 2022
OpenMP question: private vs shared work arrays for reduction Help	8	2362	February 23, 2021
Fortran openMP - make code faster	4	476	August 31, 2022
Best practice for using OpenMP with sections containing do loops Help	12	2181	May 4, 2023
OpenMP: efficiency in reduction in large array	16	1388	December 13, 2022

OpenMP Parallel Loop

Related topics