MPI in my opinion, works best if you can clearly or easily distribute almost equal amount of jobs on each of the cores involved. The communications between each cores should be the fewer the better.
If your job is to solve, like a system of 10^4 ODEs, then perhaps you might want to use parallelized solver like Sundials. If this is what you want, I guess the best option is perhaps to just use Sundials instead as suggested by @nicholaswogan, @ivanpribec etal,
On the other hand, if you want to solve like 3-ODE system for 10^4 times, then say you have 100 cores, you can distribute 100 of such 3-ODE systems in each of the cores.
In general, I think you may want to isolate the part you want to parallelize
do j = 1, n
XXXXXXXXX # you want to parallelize this part
enddo
into an independent subroutine first, call it subroutine AAA.
do j = 1, n
call AAA(j,...) # AAA needs to be completely and absolutely an independent subroutine.
enddo
Then in MPI it would be more or less like below I guess,
do i = 0, nproc()-1 # say 100 cores, so from rank0 to rank99
if (myrank()==i) then
jstart_i = ...
jend_i = ...
do j =jstart_i, jend_i # distribute corresponding jobs from jstart_i to jend_i on rank i
call AAA(j,...)
enddo
# when finished exit the do loop
endif
enddo
Or perhaps you do not need the loop at all. Just directly let each rank do the subroutine AAA with the jobs they are assigned.