Parallelize this subroutine with Open MPI

MPI in my opinion, works best if you can clearly or easily distribute almost equal amount of jobs on each of the cores involved. The communications between each cores should be the fewer the better.

If your job is to solve, like a system of 10^4 ODEs, then perhaps you might want to use parallelized solver like Sundials. If this is what you want, I guess the best option is perhaps to just use Sundials instead as suggested by @nicholaswogan, @ivanpribec etal,

On the other hand, if you want to solve like 3-ODE system for 10^4 times, then say you have 100 cores, you can distribute 100 of such 3-ODE systems in each of the cores.
In general, I think you may want to isolate the part you want to parallelize

do j = 1, n
  XXXXXXXXX # you want to parallelize this part
enddo

into an independent subroutine first, call it subroutine AAA.

do j = 1, n
   call  AAA(j,...)  # AAA needs to be completely and absolutely an independent subroutine. 
enddo

Then in MPI it would be more or less like below I guess,

do i = 0, nproc()-1    # say 100 cores, so from rank0 to rank99
  if (myrank()==i) then
    jstart_i = ...
    jend_i = ...
    do j =jstart_i, jend_i # distribute corresponding jobs from jstart_i to jend_i on rank i 
      call AAA(j,...)  
    enddo
   # when finished exit the do loop
  endif
enddo

Or perhaps you do not need the loop at all. Just directly let each rank do the subroutine AAA with the jobs they are assigned.

1 Like