Fixing an OpenMP race condition

ivanpribec · April 3, 2022, 10:13am

Indeed, the classic streaming version of LBM is implemented as you’ve shown. Here’s an excerpt from a separate subroutine of mine which does the same operation as yours:

      !$omp workshare
      fdst(:,:,0) = fsrc(:,:,0)
      !$omp end workshare

      !$omp do schedule(static)
      do x = 1, nx
         xp1 = mod(x, nx) + 1
         xm1 = mod(nx + x - 2, nx) + 1
         do y = 1, ny
            fdst(y,x,1) = fsrc(y,xm1,1)
         end do
      end do
      !$omp end do

The only difference is I’m using a non-local read, while yours has a non-local write. I wonder if this makes any speed difference. I have also unrolled the lattice directions, so I’m working on one “spatial” slice of PDF’s at a time. After streaming I swap the arrays using integer pointers.

The subroutine in the original post is for a variation of LBM called DUGKS (Discrete Unified Gas Kinetic Scheme). The main idea is that since the collision in LBM is discretized by a Crank-Nicholson scheme, the advective part should be evaluated at the temporal midpoint t + \Delta t/2 too. This results however in a method with quite different numerical properties. The biggest change is that the Courant number can be varied, which gives a bit more freedom in the choice of discretization parameters. However the method also requires more storage & more FLOPS, hence has lower bandwidth (more work per time step). I’m interested to evaluate whether it pays off somehow.

Topic		Replies	Views
Why the performance is poorer after using OpenMP? Help	20	5786	June 2, 2022
Fortran openMP - make code faster	4	488	August 31, 2022
OpenMP: efficiency in reduction in large array	16	1490	December 13, 2022
A simple example to compare coarrays and openmp	10	2634	February 20, 2022
OpenMP Parallel Loop Help	27	1153	August 4, 2023

Fixing an OpenMP race condition

Related topics