Fortran and MPI

I wouldn’t bet anyway on binary compatibilty between different versions of a given compiler. e.g. I avoid using a module compiled with a version n in sources compiled in a version m /= n.

I use something similar in my main code. I try to have my code with computational logic without bare MPI calls and I use generic wrappers that look similar to coarray collective calls. Where point to point communication is needed, e.g. for halo regions exchange, it is in separate modules and still uses several layers of wrappers that make the calls easier.

1 Like

First, CUDA was only introduced in 2007, by which time coarrays were well on their way to standardization in Fortran 2008, so there was no competition between coarrays and GPU support at the time. As we know, DO CONCURRENT was also introduced in Fortran 2008, but has been available as a GPU programming model since 2020. More recent efforts to add features to the Fortran 2008 coarray model have not been in competition with anything GPU-related.

Second, while there is no intellectual competition between the two, as a practical matter, the committee is quite small and it is difficult for the relevant subcommittee to pursue too many features at once. Even if they (or we, since I am part of said subcommittee) could, there is also concern about adding too many features to Fortran, which already challenges implementers. For example, the BITS proposal was dropped from Fortran 2008 because the committee believed that it was already too much change for one release of the standard.

I hold weakly the opinion that coarrays were a mistake only because it so straightforward to implement that functionality as a library, and that the committee might have been able to make it possible to implement coarrays as a library more straightforward using other language changes, which would have been easier for compilers to adopt.

We can look to UPC++, which reimplements UPC as a C++ library, thanks to the differences between C and C++. I haven’t tried to implement PGAS as a library natively in Fortran in a minimal syntax, but I have more than 15 years of experience using Global Arrays in this context, and have not seen any compelling reason to use coarrays instead, and many practical reasons not to.

If someone wants to implement coarrays as a library, consider that one can use MPI-3 RMA and C_F_POINTER to get the equivalent of an allocatable coarray (example. I made no attempt there to compress the syntax, but I imagine there is a way to create special types and use user-defined operators and/or type-bound procedures to creates something that is close enough to coarrays to solve real problems.

The current plan is to not define a new module but to make it so that an MPI_F08 implementation like VAPAA solves the problem we want to solve.

Jeff, I’m not advocating that compiler developers ditch their current formats. What I would like to see is a second option to generate a transportable module in a format all compilers can read generated by some compiler option (something like --enable iso_standard_modules). I find it hard to believe that the compiler development community can’t find some common ground on a format of some kind. There are a lot of options if people would just take the time to explore them. A markup language like implementation or just compiling to some intermediate internal representation come to mind. I’m sorry but the “can’t break backwards compatability” mantra is beginning to sound to me like a “dog ate my homework” excuse for not being willing to take the time to think outside the box. As long as the people who are tasked with defining what Fortran is are more focused on the past than the future (and to be frank trying to retain whatever competitive advantage they have by forcing lock in to their particular compiler) Fortran is doomed to extinction.

1 Like

You seem to think this is a technical problem. Backwards-compatibility is not a technical problem for the implementers. The issue is users/customers don’t like it. You don’t have to convince 10 implementation teams. You need to convince 1000s of Fortran users, including the ones at the nuclear weapons labs and the commercial engineering firms, that they need to recompile 100% of their code when the change happens, for no observable benefit to them, since their code is already working and they have higher priorities that module ABI standardization.

There are famous examples showing why breaking backwards-compatibility causes problems, even when the goal is supposedly to make things better. See e.g. https://bugzilla.redhat.com/show_bug.cgi?id=638477.

A new arXiv preprint by some people at Intel and Argonne is

Generating Bindings in MPICH
by Hui Zhou, Ken Raffenetti, Wesley Bland, Yanfei Guo

The MPI Forum has recently adopted a Python scripting engine for generating the API text in the standard document. As a by-product, it made available reliable and rich descriptions of all MPI functions that are suited for scripting tools. Using these extracted API information, we developed a Python code generation toolbox to generate the language binding layers in MPICH. The toolbox replaces nearly 70,000 lines of manually maintained C and Fortran 2008 binding code with around 5,000 lines of Python scripts plus some simple configuration. In addition to completely eliminating code duplication in the binding layer and avoiding bugs from manual code copying , the code generation also minimizes the effort for API extension and code instrumentation. This is demonstrated in our implementation of MPI-4 large count functions and the prototyping of a next generation MPI profiling interface, QMPI.

1 Like

And how is adding a new feature that didn’t exist and users are free use (or not use) at their own choosing breaking backwards compatability. As long as the compilers support their old format in addition to a new transportable one there is very little chance of “breaking” existing code. Its all about giving users a choice beyond “I have to support four different compilers” and waste unecessary time and money.

I’m not sure about other compilers, but GNU Fortran can be directed to use an arbitrary library for a coarray implementation. OpenCoarrays using MPI just happens to be the only widely used implementation. Simply Fortran for Windows, as a counter-example, does not use OpenCoarrays; instead, it ships with a custom library that implements the GNU Fortran runtime library’s coarray API (that blog post is a bit old as the library now uses named shared memory). The whole system makes heavy use of a database shared between processes and mish-mash of Windows synchronization and memory mapping API calls to implement coarrays. It is admittedly not particularly fast when there are numerous intermittent transfers between images, but it can be performant if transfers are optimized.

The above paragraph reads like a plug, which I guess it is, but someone can just implement a functioning coarray library that doesn’t use MPI. It’s not impossible.

Scientists should never negate technologies or topics before they can properly understand and correctly explain them.

From Modern Fortran explained , it’s introduction on coarrays, we have already learned about the distinction between work distribution and data distribution as a foundation of Coarray Fortran.

Work distribution in Coarray Fortran adopts the SPMD programming model. Since Fortran 2018 the language allows the programmer to create multiple, even hierarchical (nested) SPMD environments. The SPMD model is a universal programming model to create parallelism for many types of devices, including CPU, GPU, FPGA, etc. The DPC++ book puts it this way:

“One of the greatest strengths of a SPMD programming model is that it allows the same “program” to be mapped to multiple levels and types of parallelism, without any explicit direction from us. Instances of the same program could be pipelined, packed together and executed with SIMD instructions, distributed across multiple hardware threads, or a mix of all three.”

Data distribution in Coarray Fortran can be expressed through (1) coarrays (since Fortran 2008) or through (2) collective subroutines (i.e. not necessarily using any coarrays, since Fortran 2018). Thus, we can also do coarray programming without even a single coarray declared in our codes. This should help to provide codes for a variety of devices in the future, including GPUs. My personal focus is currently on kernels utilizing coarrays but have also done a starting with another kernel type to utilize collective subroutines with them.
Another important topic with coarrays is symmetric memory, that we already have with the Intel compilers.

When using coarrays for the data distribution, I am using coreRMA functionality yet not only to bulletproof check the (customized) synchronization but also to regularly check the (non-atomic) data transfer through the (theoretical) network in my programming. This may sound complicated but the coding size of individual parts can be very small (e.g. a simple basic version of a coarray-based channel implementation with non-blocking synchronization). Here, Fortran can make complicated things simple. Such codes are very easy to maintain. Main topics on top are, non-blocking synchronizations, parallel loops, pairwise independent forward progress for (not only spatial) kernels, device-wide synchronizations, hierarchical parallelism, and certainly other topics as well. With some aspects of the topics, my Fortran programming (preparation) may already be further than the DPC++ book yet is. As we are just entering the exascale era, also with new types of devices, this is certainly not the right time to negate (Coarray) Fortran.

2 Likes

Well since I haven’t tried to run anything on a Windows system other than Office products since 1993, I’ll plead ignorance about anything targeted at Windows. At one time (with ifort not sure about ifx) you could supposedly try to replace Intel’s MPI with another distribution (at least I remember something on an Intel web site that implied you could). I tried it with openMPI but it didn’t work. I know there was an experimental implementation of openCoarrays based on openSHMEM but I don’t think it got very far. For small core (< 64 core) workstations, using shared memory directly instead of MPI (which may or may not have been compiled to use shared memory instead of TCP/IP on a multi-core processor) make more sense. I’ll add Simply Fortran to my list of compilers supporting co-arrays. I also just found out that Fujitsu’s compiler also supports them but I’m not sure what the transport layer is based on.

You certainly can use the Intel compiler with other MPI libraries and it is, in fact, quite commonly done on supercomputers and HPC clusters.

Yes, I’ve done that for MPI only runs for a couple of decades. The question though is can a non-Intel MPI implementation be used in place of Intel’s MPI as the transport layer for co-arrays. Again, I read a comment from an Intel person either here or on the Intel Fortran forum that implied you could. My one attempt at using openMPI instead of Intel’s MPI for co-arrays failed.

Edit. You still have to build any MPI distribution on any system with the Intel compilers if you want to use the mpi_f08 module instead of mpif.h in your Fortran applications compiled with ifort or ifx.