The current implementations of Coarray Fortran suffer from circular logic. People hesitate to use it because of the performance issues mentioned, and compiler developers do not seem to prioritize Coarray performance enhancements because people hesitate to use it.
I have implemented the same algorithms using Coarrays and MPI, and the Coarray Fortran looks concise and beautiful, nearly perfect. But exemplary performant implementations, comparable to MPI, also matter. Otherwise, the usage remains limited to educational parallel computing activities.
It is easy to criticize something to which I have contributed zero. I appreciate the efforts of Damian Rouson and Sourcery Institute (for their impressive OpenCoarrays library which had the best Coarray performance in my tests) and the Intel compiler team for their full implementation of Coarrays 2018. It’s a feat. But some (performance and auxiliary) improvements appear essential to see Coarrays more in production code.
p.s. I mentioned OpenCoarrays and Intel ifort because these are the two Coarray implementations that I have frequently used and tested until now. The NAG compiler’s implementation looks quite promising, especially if it offers the interoperation flexibility of MPI/OpenMP.
I look forward to testing it soon.