@gareth strangely, when I use buffers, the time per iteration increased
The additional time is probably due to the fact that I need to assign the values to the buffer first. You can track the changes here: Comparing main...buffers 路 obdwinston/Parallel-Fortran 路 GitHub
Before using buffer (recap):
is = tiled_indices_start
ie = tiled_indices_end
nt = time_step_iterations
allocate(U(n_cells)) ! U field (unstructured mesh)
...
do n = 1, nt
...
do i = is, ie
... ! operations with U
U(i)[1] = U(i) ! gathering back to image 1
sync all
end do
...
end do
Elapsed time per iteration (cpu_time): 1.9869999999999610E-003
Elapsed time per iteration (system_clock): 2.0000000000000000E-003
Estimated time remaining (h:m:s): 0 0 47
After using buffer:
is = tiled_indices_start
ie = tiled_indices_end
nt = time_step_iterations
allocate(U(n_cells)) ! U field (unstructured mesh)
allocate(B(is:ie)) ! U field buffer
...
do n = 1, nt
...
do i = is, ie
... ! operations with U buffer
B(i) = ... ! assign result to U buffer
end do
U(is:ie)[1] = B(is:ie) ! gathering back to image 1
sync all
...
end do
Elapsed time per iteration (cpu_time): 8.0100000000129512E-003
Elapsed time per iteration (system_clock): 8.0000000000000002E-003
Estimated time remaining (h:m:s): 0 0 0