Scientists should never negate technologies or topics before they can properly understand and correctly explain them.
From Modern Fortran explained , it’s introduction on coarrays, we have already learned about the distinction between work distribution and data distribution as a foundation of Coarray Fortran.
Work distribution in Coarray Fortran adopts the SPMD programming model. Since Fortran 2018 the language allows the programmer to create multiple, even hierarchical (nested) SPMD environments. The SPMD model is a universal programming model to create parallelism for many types of devices, including CPU, GPU, FPGA, etc. The DPC++ book puts it this way:
“One of the greatest strengths of a SPMD programming model is that it allows the same “program” to be mapped to multiple levels and types of parallelism, without any explicit direction from us. Instances of the same program could be pipelined, packed together and executed with SIMD instructions, distributed across multiple hardware threads, or a mix of all three.”
Data distribution in Coarray Fortran can be expressed through (1) coarrays (since Fortran 2008) or through (2) collective subroutines (i.e. not necessarily using any coarrays, since Fortran 2018). Thus, we can also do coarray programming without even a single coarray declared in our codes. This should help to provide codes for a variety of devices in the future, including GPUs. My personal focus is currently on kernels utilizing coarrays but have also done a starting with another kernel type to utilize collective subroutines with them.
Another important topic with coarrays is symmetric memory, that we already have with the Intel compilers.
When using coarrays for the data distribution, I am using coreRMA functionality yet not only to bulletproof check the (customized) synchronization but also to regularly check the (non-atomic) data transfer through the (theoretical) network in my programming. This may sound complicated but the coding size of individual parts can be very small (e.g. a simple basic version of a coarray-based channel implementation with non-blocking synchronization). Here, Fortran can make complicated things simple. Such codes are very easy to maintain. Main topics on top are, non-blocking synchronizations, parallel loops, pairwise independent forward progress for (not only spatial) kernels, device-wide synchronizations, hierarchical parallelism, and certainly other topics as well. With some aspects of the topics, my Fortran programming (preparation) may already be further than the DPC++ book yet is. As we are just entering the exascale era, also with new types of devices, this is certainly not the right time to negate (Coarray) Fortran.