OpenMP code implementation in gfortran and intel

One more way to ensure correctness would be to move this variables into an internal scope:

Yeah, that’s true! visual validation is the way I like to check the correctness of the code.

1 Like

Yes. This is true to check the portability of the code.

2 Likes

exactly

1 Like

Just had another round of tests. I always get the same results with ifx on windows 10.

1 Like

I just installed the latest version of intel Fortran compiler Version 2024.2.0 and run the tests.

1 Like

You could also have the loop body as a subroutine, with local dynamic variables.

Surely the explicit use of “private ( ip, im, jp, jm, phi_old, term1, term2, theta, m )” provides the clearest outcome and best documents the code.

Why be so obscure !!

(use explicit private could be a comment to many posts in this thread)

1 Like

Thanks for this reference.

So it appears that, in the absence of a clause specifying the attribute, the default is shared.

From previous comments, this appears to be what Gfortran does with theta, m and phi_old.
Ifort, however, with it’s experience of auto-parallel, appears to identify that these should be private and so produces a better (non-conforming?) result.
My experience using OpenMP suggests that relying on this default action is not a good approach, which is probably why I did not recall the default shared.

I also prefer the OpenMP shared() clause.

I was hoping that the block-scoped version would contribute a new angle to readers of this thread, of what private and shared even mean and why they are required for correct execution.

In C with its { } scopes and practice of declaring variables on the spot private() is less used.

I think the easiest way is to declare these variables as arrays.

theta(i,j)
m(i,j)
phi_old(i,j)
term1(i,j)
term2(i,j)

then using only

!$omp parallel do private(i,j,ip,im,jp,jm)

can make it run on both compilers without any problem ( I just had a try).

Not only this will hurt the performances (traversing arrays has a cost in terms of cache miss and of first touch), but the right (and still easy) way is to declare as PRIVATE the variables that needs to be private. OpenMP is well designed, just use the features it offers.

It all depends if the code is directly written with OpenMP in mind or if this is an existing code that is OpenMPized :slight_smile: … In the latter case, just declaring some variables as private doesn’t require changing the serial code, so you don’t have to retest it. In the former case, the block-scoped approach is indeed cleaner.

2 Likes

@PierU is absolutely right here.

One of the critical parameters when it comes to optimizing code performance is the code balance,

B_C = \frac{\text{data traffic (bytes)}}{\text{arithmetic operations (flops)}}

The inverse 1/B_C is also known as the arithmetic or computational intensity. Some authors even call it a computational force.

Changing the local loop variables into full 2-d arrays will artificially increase the code balance for no good reason, pushing the code (further) down into the bandwidth-limited regime of the roofline performance model.

3 Likes

@PierU, @ivanpribec and others are giving you good advice!

In our code, we always use !$OMP PARALLEL DEFAULT(SHARED) and then go through the parallel section with a fine-tooth comb and declare all variables which should be exclusive to each thread with !$OMP PRIVATE. This is required because the compiler would have a hard time figuring out our intent for each variable.

Interestingly, your variables i and j are private by default because they are loop variables. See here.

6 Likes

Placing the block structure inside the loops implies that the stack allocation and deallocation is done each pass inside the innermost loop. For openmp parallelization, you really only want that overhead to occur once per thread, independently of the number of loop iterations. Of course, the compiler can recognize this and can optimize the allocation steps, but that puts the programmer in a position of specifying an incorrect algorithm, and then relying on the compiler optimization to correct it. It is always better, in principle, for the programmer to specify directly and clearly his intentions (to the compiler and also to human readers of the code).

To follow up on this “puzzled”, OpenMP diagnostics could be much improved.

There is a problem in using OpenMP, where the compilers I have used (Ifort and Gfortran) do not provide good diagnostics when interpreting !$OMP directives.

The worst case is when you mis-type !$OMP or don’t correctly identify the continuation lines. In these cases, the only identifying feature can be that there is no performance improvement, which is the same case for a memory bandwidth bottleneck.

A helpful report could include the shared/private/first private status of all variables or arrays referenced in the !$OMP region, although these can be hidden in called routines.

I combine IMPLICIT NONE and DEFAULT (NONE) to report any variable I have not explicitly referenced in the shared/private directive.
Note, for variables or arrays that are not re-defined in the OMP region, assuming shared is a safe outcome as only those re-defined may need the private attribute. Including this could simplify the DEFAULT (NONE) response.

Is this an area that compilers should address, or am I missing this compiler feature ?

Very interesting! I have one question though: the two loops over i and j seem independent, so I would write do collapse (2) to improve performance. Without the do collapse, only the first loop is parallelized, I think.

Does this make sense?