Learning coarrays, collective subroutines and other parallel features of Modern Fortran

If a collective subroutine solves your problem, always choose it over writing your own algorithm using coarrays. For me, this is part of a general preference for intrinsic procedures over custom algorithms both for clarity and potentially for performance reasons – although the clarity is so important to me that I would even accept a small performance penalty if necessary. As my book co-author Jim Xia says, “Let the compiler do its job.” In the specific case of collective subroutines, Jim’s advice is especially important. Doctoral dissertation chapters and possibly even whole dissertations have been written on optimizing the sorts of parallel algorithms that collective subroutines embody. I have considerable experience in writing my own versions of collective subroutines because I first dove into parallel Fortran in 2012, at which point the Intel and Cray compilers supported coarrays and the WG5 standards committee was working on a draft of “TS 18508 Additional Parallel Features in Fortran,” which defined the collective subroutines so I started writing subroutines that emulated the collective subroutines to make it easier to migrate to Fortran 2018 when the compilers started supporting it. I can tell you from that experience that the collective subroutines provided by the language generally outperform even reasonably sophisticated user-defined coarray algorithms that accomplish the same things. To write efficient collective communication requires accounting for a range of factors go well beyond what I would have the stomach to attempt to get right myself: network topology, bandwidth, latency, message size, etc. Moreover, the standard does not require the synchronizations that most naive developers would employ to get the communication right. This is really important because any form of synchronization implies waiting and waiting hurts performance.

I have a deep backlog of publications that I need to get out the door soon and plan to submit over the next several months. Hopefully one such publication will put some data behind the above statements. I have some of the data in slides that I’ll post if I can find a moment to dig them up.

8 Likes