Parallel Fortran Coarrays Longer CPU Time Than Serial Fortran

Interesting that it’s even slower! Possible issues are

  • While you’ve introduced buffers, in some places you haven’t removed the previous communications (such as line 64 or 93 of mod_solve.f90). Try to remove such things (e.g. some cases could be eliminated by doing the addition on the receiving process, while others could be replaced with contiguous communication outside of a loop).
  • The buffers aren’t leading to contiguous communication. For example line 96 of mod_solve.f90 has a non-contiguous block in the communication (remember Fortran’s array memory order convention, x(i:j,:) is non-contiguous unless i:j is the entire first dimension). Depending on how smart your compiler is, it might split these up into lots of smaller communications.

Maybe these things will improve the speed (or not, just a guess).

But irrespective, you’re unlikely to get good performance while all the communication is passing through image 1 alone.