OpenMP code implementation in gfortran and intel

I have the following code


program openmp_test
    implicit none
    
    !=============
    
    integer ( kind = 4 ), parameter :: Nx = 100
    integer ( kind = 4 ), parameter :: Ny = 100
    real ( kind = 8 )               :: dx = 0.03
    real ( kind = 8 )               :: dy = 0.03
    
    !=============
    
    integer (kind = 4 ) :: nsteps = 1000
    integer (kind = 4 ) :: nprint = 100
    integer (kind = 4 ) :: tsteps 
    real ( kind = 8 )   :: dtime  = 1.0e-4
    
    !===============
    
    real ( kind = 8 )   :: tau   = 0.0003
    real ( kind = 8 )   :: epsilonb = 0.01
    real ( kind = 8 )   :: kappa = 1.8
    real ( kind = 8 )   :: delta = 0.02
    real ( kind = 8 )   :: aniso = 6.0
    real ( kind = 8 )   :: alpha = 0.9
    real ( kind = 8 )   :: gama  = 10.0
    real ( kind = 8 )   :: teq   = 1.0
    real ( kind = 8 )   :: theta0= 0.2 
    real ( kind = 8 )   :: seed  = 5.0
    
    real ( kind = 8 )   :: pix   = 4.0*atan(1.0)
    
    !===============
    
    real ( kind = 8 ) , dimension( Nx, Ny ) :: phi, tempr
    real ( kind = 8 ) , dimension( Nx, Ny ) :: lap_phi, lap_tempr
    real ( kind = 8 ) , dimension( Nx, Ny ) :: phidx, phidy
    real ( kind = 8 ) , dimension( Nx, Ny ) :: epsil, epsilon_deriv
    real ( kind = 8 )                       :: phi_old, term1, term2
    real ( kind = 8 )                       :: theta, m
    integer ( kind = 4 )                    :: i, j, ip, im, jp, jm
    
    
    
    !============================================================
    
    
    
    
    phi = 0.0
    tempr = 0.0
    
    do i = 1, Nx
        do j = 1, Ny
            if ( (i - Nx/2.0)*(i - Nx/2.0) + (j - Ny/2.0)*(j - Ny/2.0)&
            & < seed ) then
                phi(i,j) = 1.0
            end if
        end do
    end do
    
    
    
    
    !============================================================
    
    
    
    time_loop: do tsteps = 1, nsteps
        
        
        !$omp parallel do private(i,j,ip,im,jp,jm)
        
        do j = 1, Ny
            do i =1, Nx                
                
                jp = j + 1
                jm = j - 1
                
                ip = i + 1
                im = i - 1
                
                if ( im == 0 ) im = Nx
                if ( ip == ( Nx + 1) ) ip = 1
                if ( jm == 0 ) jm = Ny
                if ( jp == ( Ny + 1) ) jp = 1
                
                !=====
                
                lap_phi(i,j) = ( phi(ip,j) + phi(im,j) + phi(i,jm) + phi(i,jp)&
                & - 4.0*phi(i,j)) / ( dx*dy )
                lap_tempr(i,j) = ( tempr(ip,j) + tempr(im,j) + tempr(i,jm) + &
                & tempr(i,jp) - 4.0*tempr(i,j)) / ( dx*dy )
                
                !======
                
                phidx(i,j) = ( phi(ip,j) - phi(im,j) ) / dx
                phidy(i,j) = ( phi(i,jp) - phi(i,jm) ) / dy
                
                !======
                
                theta  = atan2( phidy(i,j),phidx(i,j) )
                
                !======
                
                epsil(i,j) = epsilonb*( 1.0 + delta*cos(aniso*&
                & ( theta - theta0 ) ) )
                epsilon_deriv(i,j) = -epsilonb*aniso*delta*sin&
                & ( aniso*( theta - theta0 ) )
                
            end do
        end do
        
        !$omp end parallel do
        
        
        !$omp parallel do private(i,j,ip,im,jp,jm)
        
        do j = 1, Ny
            do i =1, Nx 
                
                jp = j + 1
                jm = j - 1
                
                ip = i + 1
                im = i - 1
                
                if ( im == 0 ) im = Nx
                if ( ip == ( Nx + 1) ) ip = 1
                if ( jm == 0 ) jm = Ny
                if ( jp == ( Ny + 1) ) jp = 1
                
                phi_old = phi(i,j)
                
                !========
                
                term1 = ( epsil(i,jp)*epsilon_deriv(i,jp)*phidx(i,jp)&
                & - epsil(i,jm)*epsilon_deriv(i,jm)*phidx(i,jm) ) / dy
                term2 = -( epsil(ip,j)*epsilon_deriv(ip,j)*phidy(ip,j)&
                & - epsil(im,j)*epsilon_deriv(im,j)*phidy(im,j) ) / dx
                
                !========
                
                m = alpha/pix*atan( gama*( teq - tempr(i,j) ) )
                
                !========
                
                phi(i,j) = phi(i,j) + ( dtime/tau )*( term1 + term2 +&
                & epsil(i,j)**2*lap_phi(i,j) ) + &
                & phi_old*( 1.0 - phi_old )*( phi_old -0.5 + m )
                tempr(i,j) = tempr(i,j) + dtime*lap_tempr(i,j) &
                & + kappa*( phi(i,j) - phi_old )
                
            end do
        end do
        !$omp end parallel do
        
        
        ! print steps
        
        if ( mod( tsteps, nprint ) .eq. 0 ) print *, 'Done steps  =  ', tsteps
        
        
        
    end do time_loop
    
    
    !============================================================================
    ! output file
    
    open ( 1, file = "phi.dat" )
    do i = 1, Nx
        write( 1, 10 ) ( phi(i,j),j = 1, Ny )
    end do
    10 FORMAT(1000000F10.6)
    close( 1 )
    
    

end program openmp_test

Intel ifx compiler

If I run with intel compiler with the command

ifx main.f90 /Qopenmp /F5000000
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.0.0 Build 20231017
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:main.exe
-subsystem:console
-stack:5000000
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
main.obj

>main
 Done steps  =           100
 Done steps  =           200
 Done steps  =           300
 Done steps  =           400
 Done steps  =           500
 Done steps  =           600
 Done steps  =           700
 Done steps  =           800
 Done steps  =           900
 Done steps  =          1000

I get the right output which shows numerical values

gfortran

But if I run with gfortran compiler the code runs fine but the output results are different

The commands are

>gfortran -frecursive main.f90 -fopenmp -o main

>main
 Done steps  =           100
 Done steps  =           200
 Done steps  =           300
 Done steps  =           400
 Done steps  =           500
 Done steps  =           600
 Done steps  =           700
 Done steps  =           800
 Done steps  =           900
 Done steps  =          1000
>gfortran -fmax-stack-var-size=5000000 main.f90 -fopenmp -o main
f951.exe: Warning: Flag '-fmax-stack-var-size=5000000' overwrites '-frecursive' implied by '-fopenmp'

>main
 Done steps  =           100
 Done steps  =           200
 Done steps  =           300
 Done steps  =           400
 Done steps  =           500
 Done steps  =           600
 Done steps  =           700
 Done steps  =           800
 Done steps  =           900
 Done steps  =          1000

The output file (in both cases of gfortran) has values :

       NaN       NaN      

So my questions are:

What are the right gfortran commands to run this code?
Or there are some gfortran specific OpenMP implementation details that are missing in the code?

My computer details are

GNU Fortran (GCC) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Processor 11th Gen Intel(R) Core™ i5-11500 @ 2.70GHz 2.71 GHz
Installed RAM 8.00 GB (7.83 GB usable)
System type 64-bit operating system, x64-based processor

1 Like

Your temporary variables such as theta and phi_old should be declared as private in the !$OMP section. This may be the source of the problem.

3 Likes

Got it! That works fine with private declaration. But how come the intel compiler produces the same result without private declaration? This is my main concern. Why the results of OpenMP implementation are different across different compilers? Does that mean OpenMP is not portable for different compilers?

1 Like

OmpMP codes with a race condition (because some variables are not private while they should be) give totally unpredictable and non-repeatable results (between compilers, between runs with the same compiler,…).

Have you checked that the executable produced by the Intel compiler actually runs with more than one thread? And are they really the same results?

2 Likes

The OP’s code has a lot of declarations like

real ( kind = 8 )               :: dx = 0.03

but with most compilers that makes 0.03 single precision and dx double precision. As a result, printing dx with format * gives

 2.9999999329447746E-002

when the compiler is gfortran, g95 or AOCC flang, but

 2.999999932944775E-002

if it’s ifx or ifort.
If double precision accuracy is wanted and one still assumes that that is kind 8, then 0.03 could be replaced by 0.03_8

1 Like

Indeed, I noticed LFortran and GFortran to differ (in serial) due to this single/double precision corruption: OpenMP example · Issue #4388 · lfortran/lfortran · GitHub.

I also checked openmp with GFortran and it gives different results to serial GFortran, so I think the code is not quite correct. (LFortran can’t compile this example with OpenMP yet, but we’ll fix it soon.)

P.S. I am forwarding LFortran’s style suggestion. :wink:

1 Like

This is mostly a matter of personal preference for coding style, but most programmers recommend to avoid using literal constants for kind values. In this case, the programmer could define

integer, parameter :: wp = selected_real_kind(14)

and then specify the real variables as

real(wp) :: dx = 0.03_wp

and so on. There are several other ways to define the wp parameter, the above is just one example. The important practical factor is that if the precision ever needs to be changed, that change just requires modifying that one line of code, not dozens of declarations and literal constants scattered throughout the program.

2 Likes

I did some modification:

  1. made the code single precision.
  2. added system clock for timing.
  3. graphics validation with dislin.
  4. bigger domain size and time steps.
  5. did not declare those variables to be private e.g. theta

The code is

program openmp_test
    implicit none
    
    !=============
    
    integer ( kind = 4 ), parameter :: Nx = 700
    integer ( kind = 4 ), parameter :: Ny = 700
    real ( kind = 4 )               :: dx = 0.03
    real ( kind = 4 )               :: dy = 0.03
    
    !=============
    
    integer (kind = 4 ) :: nsteps = 7000
    integer (kind = 4 ) :: nprint = 1000
    integer (kind = 4 ) :: tsteps 
    real ( kind = 4 )   :: dtime  = 1.0e-4
    real ( kind = 8 )   :: computed_time 
    integer (kind = 8)  :: rate, stop_count, start_count
    
    !===============
    
    real ( kind = 4 )   :: tau   = 0.0003
    real ( kind = 4 )   :: epsilonb = 0.01
    real ( kind = 4 )   :: kappa = 1.8
    real ( kind = 4 )   :: delta = 0.02
    real ( kind = 4 )   :: aniso = 6.0
    real ( kind = 4 )   :: alpha = 0.9
    real ( kind = 4 )   :: gama  = 10.0
    real ( kind = 4 )   :: teq   = 1.0
    real ( kind = 4 )   :: theta0= 0.2 
    real ( kind = 4 )   :: seed  = 5.0
    
    real ( kind = 4 )   :: pix   = 4.0*atan(1.0)
    
    !===============
    
    real ( kind = 4 ) , dimension( Nx, Ny ) :: phi, tempr
    real ( kind = 4 ) , dimension( Nx, Ny ) :: lap_phi, lap_tempr
    real ( kind = 4 ) , dimension( Nx, Ny ) :: phidx, phidy
    real ( kind = 4 ) , dimension( Nx, Ny ) :: epsil, epsilon_deriv
    real ( kind = 4 )                       :: phi_old, term1, term2
    real ( kind = 4 )                       :: theta, m
    integer ( kind = 4 )                    :: i, j, ip, im, jp, jm
    
    
    
    
    !============================================================
    
    
    
    
    
    phi = 0.0
    tempr = 0.0
    
    do i = 1, Nx
        do j = 1, Ny
            if ( (i - Nx/2.0)*(i - Nx/2.0) + (j - Ny/2.0)*(j - Ny/2.0)&
            & < seed ) then
                phi(i,j) = 1.0
            end if
        end do
    end do
    
    
    
    
    
    ! ======
    ! Start timing the code
    ! ========
    call system_clock (count=start_count , count_rate=rate)
    
    
    
    
    
    !============================================================
    
    
    
    
    time_loop: do tsteps = 1, nsteps
        
        
        !$omp parallel do private(i,j,ip,im,jp,jm)
        
        do j = 1, Ny
            do i =1, Nx                
                
                jp = j + 1
                jm = j - 1
                ip = i + 1
                im = i - 1
                
                if ( im == 0 ) im = Nx
                if ( ip == ( Nx + 1) ) ip = 1
                if ( jm == 0 ) jm = Ny
                if ( jp == ( Ny + 1) ) jp = 1
                
                !=====
                
                lap_phi(i,j) = ( phi(ip,j) + phi(im,j) + phi(i,jm) + phi(i,jp)&
                & - 4.0*phi(i,j)) / ( dx*dy )
                lap_tempr(i,j) = ( tempr(ip,j) + tempr(im,j) + tempr(i,jm) + &
                & tempr(i,jp) - 4.0*tempr(i,j)) / ( dx*dy )
                
                !======
                
                phidx(i,j) = ( phi(ip,j) - phi(im,j) ) / dx
                phidy(i,j) = ( phi(i,jp) - phi(i,jm) ) / dy
                
                !======
                
                theta  = atan2( phidy(i,j),phidx(i,j) )
                
                !======
                
                epsil(i,j) = epsilonb*( 1.0 + delta*cos(aniso*&
                & ( theta - theta0 ) ) )
                epsilon_deriv(i,j) = -epsilonb*aniso*delta*sin&
                & ( aniso*( theta - theta0 ) )
                
            end do
        end do
        
        !$omp end parallel do
        
        
        !$omp parallel do private(i,j,ip,im,jp,jm)
        
        do j = 1, Ny
            do i =1, Nx 
                
                jp = j + 1
                jm = j - 1
                ip = i + 1
                im = i - 1
                
                if ( im == 0 ) im = Nx
                if ( ip == ( Nx + 1) ) ip = 1
                if ( jm == 0 ) jm = Ny
                if ( jp == ( Ny + 1) ) jp = 1
                
                phi_old = phi(i,j)
                
                !========
                
                term1 = ( epsil(i,jp)*epsilon_deriv(i,jp)*phidx(i,jp)&
                & - epsil(i,jm)*epsilon_deriv(i,jm)*phidx(i,jm) ) / dy
                term2 = -( epsil(ip,j)*epsilon_deriv(ip,j)*phidy(ip,j)&
                & - epsil(im,j)*epsilon_deriv(im,j)*phidy(im,j) ) / dx
                
                !========
                
                m = alpha/pix*atan( gama*( teq - tempr(i,j) ) )
                
                !========
                
                phi(i,j) = phi(i,j) + ( dtime/tau )*( term1 + term2 +&
                & epsil(i,j)**2*lap_phi(i,j) ) + &
                & phi_old*( 1.0 - phi_old )*( phi_old -0.5 + m )
                tempr(i,j) = tempr(i,j) + dtime*lap_tempr(i,j) &
                & + kappa*( phi(i,j) - phi_old )
                
            end do
        end do
        !$omp end parallel do
        
        
        ! print steps
        
        if ( mod( tsteps, nprint ) == 0 ) print *, 'Done steps  =  ', tsteps
        
        
        
    end do time_loop
    
    
    
    !====
    ! end timing the code
    !====
    call system_clock (count=stop_count)
    
    
    computed_time = real(max(stop_count - start_count , 1_8 )) /real(rate)    
    
    
    
    
    
    
    !__________________________________________________________________________________
    !                                  Output
    ! 
    
    
    
    
    
    ! ====
    ! Print time of computation
    ! ====
    
    print*, ""
    print*, ""
    print*, "-------------------------------------------------"
    print*, ' The code took     = ', computed_time,' seconds'
    
    
    
    
    
    ! ====
    ! Quick dislin plot for graphics validation
    ! ====
    
    call qplclr (phi,Nx,Ny)
    
    
    
    
    
    
    !============================================================================
    ! Save file for graphics validation with gnuplot
    
    
    
    open ( 1, file = "phi.dat" )
    do i = 1, Nx
        write( 1, 10 ) ( phi(i,j),j = 1, Ny )
    end do
    10 FORMAT(1000000F10.6)
    close( 1 )
    
    

end program openmp_test

Intel Compiler test

i did perform the tests

  1. without openmp flag

  1. with openmp flag

Output

  1. The output is same in both cases.

dislin output

gnuplot output
gnuplot_output

So it seems like the code was run in parallel with intel compiler but did not catch any race condition with OpenMP.

1 Like

Why would you choose to do that ?
I would strongly recommend to use DEFAULT (NONE) and explicitly define SHARED or PRIVATE for all variables in the OMP region. If nothing else, it gives you the opportunity to review the status of all variables used in the OMP region.

The other interesting feature of this thread is the difference between ifort and gfortran in the way they allocate shared vs private status where the variable’s status is not explicitly declared.
My understanding of the posts is there has been some difference identified between the two compilers. (I don’t think that the OpenMP specification defines this behaviour ?)
Given Ifort’s provision for auto parallel, I would expect a more appropriate result with Ifort. (but not guaranteed to be correct !!)

It is important to recognise that in some cases, either the SHARED or PRIVATE attribute can be valid and produce significantly different results. For this reason, you should check each variable’s status to get the result you require. I don’t know of a report of what status is allocated to un-allocated variables.

I always use IMPLICIT NONE and DEFAULT (NONE)

1 Like

To see the difference between compiler behavior (Intel vs gfortran) for the variables (e.g. theta) that are supposed to be private using OpenMP.

For intel the output is shown in the figures.

For gfortran the output is

NaN

The output would be the same (as in figures) if the variables are declared private. This is true for both compilers.

That is interesting to know. So which compiler flag or procedure in the code I should use to check the status of the SHARED AND PRIVATE attribute of the variable? I am not familiar with that so far !

Please double check or triple check your tests… I have compiled and run your code with ifort 21 on Linux, and with OpenMP I do not have at all the same results with theta, phi_old and m declared private or not. As expected…

I implemented !$omp parallel do default ( none ) using Gfortran and all appears to work without NaN.

The modified code is:

! test of openmp example
!  V0 is original program
!  v1 has basic changes:
!     consistent real*8 constants
!     clean out unnecessary continuation
!     use !$omp  parallel do default ( none ) > declate variables as shared / private
!     report compiler and options

program openmp_test
    use iso_fortran_env
    implicit none

    
    !=============
    
    integer ( kind = 4 ), parameter :: Nx = 100
    integer ( kind = 4 ), parameter :: Ny = 100
    real ( kind = 8 )               :: dx = 0.03d0
    real ( kind = 8 )               :: dy = 0.03d0
    real ( kind = 8 ),    parameter :: one  = 1.0
    real ( kind = 8 ),    parameter :: two  = 2.0
    real ( kind = 8 ),    parameter :: four = 4.0
    real ( kind = 8 ),    parameter :: half = 0.5
    
    !=============
    
    integer (kind = 4 ) :: nsteps = 1000
    integer (kind = 4 ) :: nprint = 100
    integer (kind = 4 ) :: tsteps 
    real ( kind = 8 )   :: dtime  = 1.0d-4
    
    !===============
    
    real ( kind = 8 )   :: tau   = 0.0003d0
    real ( kind = 8 )   :: epsilonb = 0.01d0
    real ( kind = 8 )   :: kappa = 1.8d0
    real ( kind = 8 )   :: delta = 0.02d0
    real ( kind = 8 )   :: aniso = 6.0d0
    real ( kind = 8 )   :: alpha = 0.9d0
    real ( kind = 8 )   :: gama  = 10.0d0
    real ( kind = 8 )   :: teq   = 1.0d0
    real ( kind = 8 )   :: theta0= 0.2d0
    real ( kind = 8 )   :: seed  = 5.0d0
    
    real ( kind = 8 )   :: pix   = four*atan(one)
    
    !===============
    
    real ( kind = 8 ) , dimension( Nx, Ny ) :: phi, tempr
    real ( kind = 8 ) , dimension( Nx, Ny ) :: lap_phi, lap_tempr
    real ( kind = 8 ) , dimension( Nx, Ny ) :: phidx, phidy
    real ( kind = 8 ) , dimension( Nx, Ny ) :: epsil, epsilon_deriv
    real ( kind = 8 )                       :: phi_old, term1, term2
    real ( kind = 8 )                       :: theta, m
    integer ( kind = 4 )                    :: i, j, ip, im, jp, jm
    
    
    
    !============================================================
    
    write (*,*) 'Vern : ',compiler_version ()
    write (*,*) 'Opts : ',compiler_options ()

    
    call timer ( 'start' )
    
    phi = 0.0
    tempr = 0.0
    
    do i = 1, Nx
        do j = 1, Ny
            if ( (i - Nx/two)*(i - Nx/two) + (j - Ny/two)*(j - Ny/two) < seed ) then
                phi(i,j) = one
            end if
        end do
    end do
    
    call timer ( 'initialise phi' )
    
    
    
    !============================================================
    
    
    
    time_loop: do tsteps = 1, nsteps
        
        
        !$omp  parallel do default ( none )     &
        !$omp&             private ( i,j, ip,im, jp,jm, theta )   &
        !$omp&             shared  ( dx, dy, lap_phi, lap_tempr, phi, tempr, phidx, phidy, epsil, epsilon_deriv,  &
        !$omp&                       theta0, aniso, delta, epsilonb )
        
        do j = 1, Ny
            do i =1, Nx                
                
                jp = j + 1
                jm = j - 1
                
                ip = i + 1
                im = i - 1
                
                if ( im == 0 ) im = Nx
                if ( ip == ( Nx + 1) ) ip = 1
                if ( jm == 0 ) jm = Ny
                if ( jp == ( Ny + 1) ) jp = 1
                
                !=====
                
                lap_phi(i,j)   = ( phi(ip,j) + phi(im,j) + phi(i,jm) + phi(i,jp) - four*phi(i,j)) / ( dx*dy )
                lap_tempr(i,j) = ( tempr(ip,j) + tempr(im,j) + tempr(i,jm) + tempr(i,jp) - four*tempr(i,j)) / ( dx*dy )
                
                !======
                
                phidx(i,j) = ( phi(ip,j) - phi(im,j) ) / dx
                phidy(i,j) = ( phi(i,jp) - phi(i,jm) ) / dy
                
                !======
                
                theta  = atan2( phidy(i,j),phidx(i,j) )
                
                !======
                
                epsil(i,j)         =  epsilonb *( one +  delta * cos ( aniso*( theta - theta0 ) ) )
                epsilon_deriv(i,j) = -epsilonb * aniso * delta * sin ( aniso*( theta - theta0 ) )
                
            end do
        end do
        
        !$omp end parallel do
        
        
        !$omp  parallel do default ( none )     &
        !$omp&             private ( i,j,ip,im,jp,jm, phi_old, term1, term2, m )  &
        !$omp&             shared  ( dx, dy, phi, tempr, epsil, epsilon_deriv, phidx, phidy, lap_phi, lap_tempr,  &
        !$omp&                       alpha, pix, teq, gama, dtime, tau, kappa )
        
        do j = 1, Ny
            do i =1, Nx 
                
                jp = j + 1
                jm = j - 1
                
                ip = i + 1
                im = i - 1
                
                if ( im == 0 ) im = Nx
                if ( ip == ( Nx + 1) ) ip = 1
                if ( jm == 0 ) jm = Ny
                if ( jp == ( Ny + 1) ) jp = 1
                
                phi_old = phi(i,j)
                
                !========
                
                term1 =  ( epsil(i,jp)*epsilon_deriv(i,jp)*phidx(i,jp) - epsil(i,jm)*epsilon_deriv(i,jm)*phidx(i,jm) ) / dy
                term2 = -( epsil(ip,j)*epsilon_deriv(ip,j)*phidy(ip,j) - epsil(im,j)*epsilon_deriv(im,j)*phidy(im,j) ) / dx
                
                !========
                
                m = alpha/pix * atan ( gama*( teq - tempr(i,j) ) )
                
                !========
                
                phi(i,j) = phi(i,j) + ( dtime/tau ) * ( term1 + term2 + epsil(i,j)**2 * lap_phi(i,j) ) &
                                    +  phi_old * ( one - phi_old )*( phi_old - half + m )
                tempr(i,j) = tempr(i,j) + dtime*lap_tempr(i,j) + kappa * ( phi(i,j) - phi_old )
                
            end do
        end do
        !$omp end parallel do
        
        
        ! print steps
        
        if ( mod ( tsteps, nprint ) .eq. 0 ) print *, 'Done steps  =  ', tsteps
        
        
        
    end do time_loop
    
    call timer ( 'time_loop' )
    
    !============================================================================
    ! output file
    
    open ( 11, file = "phi.dat" )
    do i = 1, Nx
        write( 11, 10 ) ( phi(i,j), j = 1, Ny )
    end do
    10 FORMAT (1000000F10.6)
    close( 11 )
    
    write (*,*) 'max phi =', maxval (phi)
    call timer ( 'report phi' )
    

end program openmp_test

  subroutine timer ( desc )
   character desc*(*)
   integer*8 :: clock, rate, last_clock = -1
   real*8    :: sec

   if ( last_clock < 0 ) then
     call system_clock ( last_clock, rate )
     sec = 0
     write (*,11) sec, desc
   else
     call system_clock ( clock, rate )
     sec = dble ( clock - last_clock ) / dble (rate)
     write (*,11) sec, desc
  11 format ( f10.3,2x,a )
     last_clock = clock
   end if
  end subroutine timer

The run log is

 Vern : GCC version 12.3.0
 Opts : -march=znver3 ... -O2 -fimplicit-none -fallow-argument-mismatch -ffast-math -fopenmp
     0.000  start
     0.000  initialise phi
 Done steps  =           100
 Done steps  =           200
 Done steps  =           300
 Done steps  =           400
 Done steps  =           500
 Done steps  =           600
 Done steps  =           700
 Done steps  =           800
 Done steps  =           900
 Done steps  =          1000
     0.206  time_loop
 max phi =   1.0026640553542336     
     0.004  report phi

The changes for default ( none) were not difficult changes !

1 Like

I think the OP is aware of that. But he is puzzled by the fact that with the Intel compiler the results are the same whether the variables theta, phi_old and m are private or not.

1 Like

What does “private or not” mean ?
Clearly ifort correctly sets variables theta, phi_old and m to private, while Gfortran may not.

However, the important issue is these variables should be explicitly set to private.

Does anyone know what the OpenMP specification says about what the default shared/private status of “undeclared” variables should be ?
My understandimg is this outcome is not defined, so Gfortran or Ifort can do what they like.

But, we should not rely on what they may do.

1 Like

As a side note: In the past I have seen that an auxiliary variable with forgotten private attribute (and explicit default(shared)) still behaved like a private variable due to optimisation. The temporary value was kept in registers and was not saved to memory, so no actual race condition happened. Once optimisation level was reduced or runtime-checks are turned on, failures appeared. The best approach for anything which exceed a few lines and variables is to move the body of a parallel omp section or do loop into a subroutine. Then the argument list makes it very clear, which variables are shared (or rather accessed at all). Moreover the private list is short and mistakes are easily avoided.

3 Likes

OpenMP 5.2 spec, section 5.1.1:
image

But there are some exceptions:
image

2 Likes

This can be an explanation, indeed…

1 Like

Speaking as a user, I always like to explicitly know what is shared and what is private, so that I can visually ensure the correctness of the parallel loop. Is that not the case for you @Shahid as well?

Of course, out of curiosity it is interesting to investigate how different compilers handle the case when variables are not explicitly marked private/shared, but it doesn’t seem that relevant to how I would like to use it as a user.

1 Like

As a user as well :), I’m quite fine with the default behavior in OpenMP parallel constructs (i.e. variables are shared unless specified otherwise). This is the natural way (at least my way) of thinking when reading the code, an entity “x” that was existing before the parallel region remains unique in the parallel region.

2 Likes