c(i,j,thread) = c(i,j,thread) + 0.052*lap_c(i,j,thread)
from
lap_c(i,j,thread) = … + c(im,j,thread) + …
but c(im,j,thread) was updated for the previous itteration (i-1) of do i = 1,nc
plus c(:,jm,thread) from the previous itteration of do j = 1,nc.
It may appear acceptable but the updated values are not all based on estimates from the previous step.
It is necessary to carry forward the unchanged single value of “c(i,j,thread)” plus the unchanged row c(:,jm,thread) from the previous itteration, with appropriate consideration of edge conditions.
C could be declared as allocate ( c(0:nc,0:nc,threads) ), and so utilise c(:,0,threads) to store the previous vector C(:,jm1,threads) and C(0,j,threads) for C(im1,j,threads), at the appropriate time in the itteration, or use a local real(kind=8) :: c_im and c_jm(nc).
The following code suggests a possible solution
real(kind=wp) :: c_jm(nc), c_jn(nc), lap_c
c_jm(:) = c(:,NC,thread)
do j = 1,nc
jp = j+1
if (j == Nc) jp = 1
c_jn(:) = c(:,j,thread)
im = nc
do i = 1,nc
ip = i+1
if (i == Nc) ip = 1
!old lap_c(i,j,thread) = c(ip,j,thread) + c(im,j,thread) + c(i,jm,thread) + c(i,jp,thread) - 4.0*c(i,j,thread)
lap_c = c_jn(ip) + c_jn(im) + c_jm(i) + c(i,jp,thread) - 4.0*c_jn(i)
c(i,j,thread) = c(i,j,thread) + 0.052*lap_c
im = i
end do
c_jm(:) = c_jn(:)
end do
In the OpenMP case, it is the edges that need to be updated between the threads at each step.