I am making a program and using OpenCoarrays to parallelize it. I would like to do computations on all processors, and then feed those calculations into an array only allocated on image 1.
However, it seems like the transfer of data is not going as planned. I’ve made a small reproducing example.
program cotest
real, dimension(:,:), codimension[:], allocatable :: arr
integer :: idx1, idx2, thisstart, thisend
if (this_image() .eq. 1) then
allocate(arr(4,2)[*])
else
allocate(arr(1,1)[*])
end if
thisstart = 2*(this_image() - 1) + 1 ! = 1 for image 1; = 2*1+1 = 3 for image 2
thisend = thisstart + 1 ! = 2 for image 1; = 4 for image 2
print *, thisstart, thisend
do idx1=thisstart, thisend
print *, "idx1:", idx1
arr(idx1, :)[1] = idx1 + this_image()
!do idx2=1,2
! arr(idx1, idx2)[1] = idx1*idx2
!end do
end do
sync all
call execute_command_line('') ! Forces things to print in order; flushes stdout
! sync all not blocking...?
if (this_image() .eq. 1) then
print *, this_image(), "is about to start the print loop."
do idx1=1, 4
print *,"for idx1", idx1, "arr is ", arr(idx1, :)[1]
end do
end if
end program cotest
Using the same compilation/run parameters, I get the output
1 2
idx1: 1
idx1: 2
3 4
idx1: 3
idx1: 4
1 is about to start the print loop.
for idx1 1 arr is 2.00000000 2.00000000
for idx1 2 arr is 3.00000000 3.00000000
for idx1 3 arr is 5.00000000 0.00000000
for idx1 4 arr is 6.00000000 0.00000000
As you can see, the image 1 part works fine, but the assignment for image 2 is not working so well. Also, if I try to assign an array of dim 2 to arr(idx1, : ) then I get a memory crash.
I suspect what is occurring is that the coarray library makes some sort of assumption that if its allocated dimensions on image 2 are (1,1), then the same must be true for the partner on image 1. If so, then what I want to do seems fruitless. EDIT: It appears I am indeed not allowed to allocate different amounts of memory on the different images. The below question is still valid; I now just don’t know how to accomplish it.
Is what I wish to do possible, i.e. only allocate the memory I need for an object on one core instead of allocating the memory on all cores? And then transfer the intermediate values computed on other cores to the central array on image 1? If so, how?
EDIT2: I know if I were to use OMP I could just create an array in shared memory that’s shared by the processors. I think the Intel compilers have a shared memory version of coarrays, but I am not sure if I should change the syntax or what for the above example.