Parallel Fortran Coarrays Longer CPU Time Than Serial Fortran

It’s been a few years since I tried using co-arrays on your typical commodity AMD/Intel 8 or 16 core processor. I gave up because of the dismal performance (but I will admit to being somewhat jaded by using co-arrays on Cray HPC systems which have the hardware necessary to support PGAS type programs with Cray’s compilers). I never investigated this but I think both MPICH and openMPI can be built to default to using shared memory instead of the TCP/IP stack on multi-core shared memory nodes. The MPI impiementations on most large HPC systems I’ve used appeared to do shared memory communications on a node and only used the switch/interconnect to go to off node processes. It’s been a very long time since I built either MPICH or openMPI from scratch and I wonder if the current default is to build for shared memory on multi-core processors. I think ifort at one time allowed you to specify if you wanted to use shared memory for co-arrays but don’t quote me on that. Again, most of my co-array experience is on Cray’s which have the hardware to make co-arrays competitive with pure MPI. I just don’t know if thats the case on your average workstation PC.

Edit

Also I would think you would only need to use -fallow-argument-mismatch if you are using the MPI include file (mpi.h) instead of the Fortran 08 module (use mpi08). Unfortunately, because the standards committee appears to have no interest in defining a transportable module format, the mpi08 module is compiler specific, meaning you will have to build a separate version of MPI for each compiler if you want ot use the mpi08 module