How to synchronize and order writing data using OpenMP?

abhro · February 7, 2026, 6:15pm

I have an “embarrassingly parallel” program which has been parallelized with OpenMP. The program spawns a bunch of threads to do independent computations, then writes the results to a file. An approximate reproduction is:

program
implicit none
  integer :: i, n
  integer :: outfile
  real(kind=8) :: datavar
! ... set-up code ...

!$omp parallel do
  do i = 1, n
    call expensive_computation_with_index(i, datavar, n, ...)

!$omp critical
    write(outfile, *) "i = ", i, " result = ", datavar
!$omp end critical
  end do
!$omp end parallel do

! ... clean-up code ...

end program

After each thread finishes its work, I want the thread to wait until all previous indices have finished writing. That is, I want the output file to look like

i = 1   result = ...
i = 2   result = ...
i = 3   result = ...

where each thread is essentially processed in order of the loop. I tried to put in a !$omp barrier clause before the write(), but that doesn’t work with !$omp parallel do. What’s the course of action here, that each thread synchronizes the file writing?

jkd2022 · February 7, 2026, 7:25pm

You can use the ORDERED construct.

However, this tends to be less efficient. A better method is to store the results in an array and then output the array sequentially after the parallel loop is complete.

PierU · February 7, 2026, 10:18pm

If expensive_computation_with_index() does indeed a lot of work, then the solution with ORDERED is ok, and the performance hit has chances to be very minor.

ivanpribec · February 7, 2026, 10:55pm

I asked Google Gemini about the “low-level way” of doing this, and let’s just say it is complicated:

! :warning:
! For illustration purposes only, use ORDERED construct if possible instead
! :warning:

  integer :: i, n, next_serial, tid, nthr
  integer :: outfile
  real(kind=8) :: datavar
! ... set-up code ...

  ! Initialize the "ticket" counter
  next_serial = 1 

  ! For Serial execution
  me = 0
  nthr = 1

  !$omp parallel private(i, tid) shared(next_serial, n, nthr)
  !$ me = omp_get_thread_num()
  !$ nthr = omp_get_num_threads()

   ! Thread-strided loop
  do i = me+1, n, nthr

     ! 1. Parallel: Do the heavy lifting here
     call expensive_computation_with_index(i)

     ! 2. Wait: Spin until it is this index's turn
     ! We use flush to ensure we are reading the latest value from memory
     do
        !$omp flush(next_serial)
        if (next_serial == i) exit
     end do

     ! 3. Serial: Write the output
     write(*, *) "i = ", i, " written by thread ", tid

     ! 4. Update: Hand off the "turn" to the next index
     ! We can simply increment because only one thread is allowed 
     ! past the spinlock at a time for this specific 'i'
     !$omp atomic update
     next_serial = next_serial + 1
     
     ! 5. Broadcast: Ensure other threads see the new value immediately
     !$omp flush(next_serial)
     
  end do
  !$omp end parallel

JohnCampbell · February 8, 2026, 3:25am

Why not save the results to
real(kind=8), allocatable :: datavar_list(:)
This will eliminate any inefficient delays to expensive_computation_with_index
If the results are more extensive, data_list can be an allocated derived type, as a shared data structure.
At the end of the !$OMP region, this can be quickly written to file or further analysed.

jorgeg · February 8, 2026, 10:46am

In order with what other people have said, you might be better off doing the computation, storing the data and then dumping everything serially. If your data is gigantic you might want to look into doing parallel I/O using HDF5 or something similar.

abhro · February 8, 2026, 2:09pm

I think the best thing to do here would be the ordered construct. datavar isn’t actually a scalar (or even just one variable), and it would be too much work to figure out a new data format (later parts of the analysis pipeline depend on the current format), and since each call to the expensive subroutine is expected to roughly take the same amount of time, the ordered write is probably a relatively small hit to be worried too much about. Also the number of threads is smaller than the number of loop runs, and we’re trying to get each thread to output as soon as it’s done, with the caveat of being in order. Thanks, everyone for all your help!

Topic		Replies	Views
Trying to make a program using OpenMP give same output as non parallel counterpart Help	2	500	May 9, 2022
OpenMP and thread-safety of I/Os (write, read...)	40	2709	March 19, 2026
OpenMP: efficiency in reduction in large array	16	1543	December 13, 2022
OpenMP Parallel Loop Help	27	1251	August 4, 2023
Why the performance is poorer after using OpenMP? Help	20	5921	June 2, 2022

How to synchronize and order writing data using OpenMP?

Related topics