I’m relatively new to Fortran and to this community. I’m trying to write some code that performs some calculations on a multidimensional array using MPI. And for the life of me I can’t get MPI_Gather to work, and I’m sure it’s something very obvious that I’m missing here.
My code is:
program test
USE mpi
USE netcdf
IMPLICIT NONE
INTEGER,ALLOCATABLE,DIMENSION(:,:,:) :: array, global_array
INTEGER :: j_start, j_end, i_start, i_end, J, I, t
! MPI
INTEGER :: mpirank, mpisize, mpierr
! Initialize MPI
call MPI_INIT(mpierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, mpisize, mpierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, mpirank, mpierr)
ALLOCATE(array(4,4,2))
ALLOCATE(global_array(4,4,2))
j_start = 1
j_end = 4
i_start = 1 + (mpirank*4/mpisize)
i_end = (mpirank + 1)*4/mpisize
DO t=1,2
DO J=j_start,j_end
DO I=i_start,i_end
array(J,I,t) = J+I+t
END DO
END DO ! J
END DO ! t
IF (mpirank==0) WRITE(*,*) 'before MPI_Gather...'
WRITE(*,*) mpirank, array(1,:,1)
CALL MPI_Barrier(MPI_COMM_WORLD, mpierr)
CALL MPI_Gather(array(:,i_start:i_end,:), 4*2*2, MPI_INTEGER, &
global_array(:,i_start:i_end,:), 4*2*2, MPI_INTEGER, &
0, MPI_COMM_WORLD, mpierr)
IF (mpirank == 0) THEN
WRITE(*,*) 'AFTER MPI_Gather...'
DO J=1,4
WRITE(*,*) (global_array(J,I,t), I=1,4,1)
END DO
END IF
DEALLOCATE(array)
DEALLOCATE(global_array)
! Finalise MPI
CALL MPI_FINALIZE(mpierr)
end program test
And I get these results, which are obviously wrong:
Note that the ‘‘gather’’ array on the rank 0 process (your global_array), needs to be the ‘‘total’’ size of all the small arrays (your array) on each of the processes.
In your case, your global_array seems to have the same size of your array (both have 4*4*2=32 elements). If not careful enough, it probably will not work as you expected.
Since you have i_start and i_end on each process, perhaps you are right. But the 4*4*2 looks wrong to me, because it does not look like you want to gather all the 4*4*2 elements of array on each process to the global_array on rank 0 process, right?
If you want to gather all the 4*4*2 elements of array on each process to the global_array on rank 0 process, your `global_array’ needs to be the size of 4*4*2*(total number of process).
I personally suggest that, you may begin with mpi_gather for 1D array first. You know, gather all the 1D arrays on all the process to the 1D array on the rank 0 process. Once you figure that out how the 1D array mpi_gather works, you could do 2D array gathering.
It gathers r array on each process to the rgather array on the rank 0 process. Note that, the size of rgather array needs to be size(r) * number of processes.
You can define a wrapper of mpi_gather by overloading like the below module shows,
Hi @CRquantum , first of all happy birthday! And thank you so much for your detailed examples! I grasp the concept of MPI_Gather and MPI_Reduce a bit better now…
I will play around with this to see if I can get it to work, I think I’ll have to do a significant re-structure of my code (the one I posted was just an example to understand the concept).