Fortran and MPI

For Fortran to stay relevant, it needs continued representation on committees that develop parallel programming standards such as MPI. At a 2017 MPI Symposium, Rolf Rabenseifner gave a presentation From MPI-1.1 to MPI-3.1, publishing and teaching, with a special focus on MPI-3 shared memory and the Fortran nightmare. Here are slides 14-17:

Fortran, a nightmare ?!?
• Only a few MPI Forum members speak Fortran – The few ones had a hard job to get MPI and Fortran consistent
• Major problems: Compiler optimizations may lead to wrong MPI
execution – with all MPI_Wait/Test routines – with using MPI_BOTTOM together with derived datatypes – with absolute addresses – calling nonblocking routines with strided data arrays that are not
simple contiguous
• Already in MPI-2.0 (1997!) the inconsistency-problem was known – but more than some text about a user-writte[n] “dd” dummy routine as a work-around was not going through the Forum!

Fortran, a nightmare – solved in MPI-3.0 (15 years later) ?!?
• For MPI-3.0 we received full service from the Fortran
standardization body by “Fortran Technical Specification TS 29113” – Enabling the new Fortran module mpi_f08
• which is the first time full consistent with the Fortran standard – Major solution:
Fortran extended the ASYNCHRONOUS keyword for any asynchronous
use-case, including MPI nonblockings and MPI_BOTTOM
• In MPI-3.0 we did the backend wrong – my apologies – A whole section in an errata MPI-3.1 – Did really slowed down the implementation – Still some MPI implementations claim to be MPI-3.1 compliant
• although they do not provide compile-time argument checking
• nor name based argument list with the mpi module

Teaching complete advanced MPI-3.1
• Important for users
can take advantages – from all the work in
the MPI Forum, and – from the implementions of all the new
MPI features in many
MPI libraries
• My MPI-3.1 course is
based on the MPI-1.1
course from EPCC – They did a great job!
25 Years of MPI
• Nonblocking collectives
• The New Fortran Module mpi_f08
• Groups & Communicators, Environment Management
o MPI_Comm_split, intra- & inter-communicators
o Re-numbering on a cluster, collective communication on
inter-communicators, info object, naming & attribute
caching, implementation information
• Virtual topologies
o including neighborhood communication +MPI_BOTTOM
• One-sided Communication
• Shared Memory One-sided Communication
o including hybrid MPI and MPI-3 shared memory
programming
o MPI memory models and synchronization rules
• Derived datatypes
o including advanced features, alignment, resizing
• Parallel File I/O
• MPI and Threads, e.g., hybrid MPI and OpenMP
• Probe, Persistent Requests, Cancel
• Process Creation and Management

Rabenseifner wrote a 720-page tutorial Introduction to the Message Passing Interface (MPI) (2023) and has given a course on Parallel programming with MPI/OpenMP.

5 Likes

Interesting, but I thought that with Fortran coarrays MPI would become less relevant for high performance computing

To the best of my knowledge, IBM (you know the folks that invented Fortran) still hasn’t added co-arrays to their compiler (maybe they have but they hadn’t the last time I looked a couple of years ago). They are still missing from Nvidia and AMD LLVM compilers. Co-arrays are great on massive HPC systems with 10s of thousands of cores that also have the underlying hardware to support them (see Cray). My attempts to use them on say an 8 core workstation have been a major waste of time. Even when they run, their performance is extremely poor. Other than NAG which uses their own shared memory implementation, all other compilers (except for Cray) use MPI as the transport layer so the only advantage co-arrays have is a slightly easier to use syntax. Up to MPI 3, co-arrays would sometimes have a performance advantage on large problems due to its being based mostly on one-sided communication. That disappeared when MPI 3 added one-sided puts and gets.

As the OP notes, the MPI folks apparently have finally gotten their act together wrt Modern Fortran. Unfortunately, there is still one major (in my opinion) problem thats not the MPI folks fault. Because of the Standards folks apparent refusal to consider a standard transportable module format that can be USED and linked by all compilers, you are forced to build a separate version of MPI for each compiler if you want to use the mpi_f08 module instead of the old mpif.h header files. I doubt that is going to be fixed in my lifetime.

The big irony is that now folks are migrating to GPU’s so offloading with openMP and hybrid MPI/openMP/openACC programming etc. might be more important to learn than co-arrays.

2 Likes

There are some plans to introduce a standard ABI for MPI:

This of course doesn’t help with the Fortran module part, but those are usually part of the Fortran compiler environment anyways. In Section 7 the authors mention the idea of a mpi_f08_abi which would be implemented on top of the standard-ABI version of MPI. If I understand correctly, this way you could switch MPI implementations just by loading different shared libraries into the enviroment.

1 Like

This looks like a step in the right direction but it still requires you to build different versions with different compilers (which is what I really want to avoid) to generate the “different shared libraries” or is there something here I’m missing (which unfortunately happens a lot more these days than it used to).

Well sure, the effort of building the MPI libraries is always there. But an application that was built on top of the standard ABI, could leverage any conforming MPI library without recompilation.

There are some interesting things happening in this area:

If I understand correctly, the parallel runtime could allow coarrays to connect with different communication layers, for example GASNet, MPI, and SHMEM. Now it’s worth noting when it comes to GPU computing, that all vendors have their SHMEM libraries, supporting GPU memory transfers,

so I’m optimistic we might see this one day. The second part of the question is whether the GPU computing models in Fortran (CUDA, OpenACC, OpenMP), permit the use of coarray-indexed variables.

Similar to the CUDA-aware MPI, Intel now also has a GPU-Aware MPI implementation, that optimizes transfers between GPU buffers when using OpenMP and SYCL:

The trick shown in the image taken from webpage is to perform the MPI call inside of the #pragma omp target region and use the use_device_ptr() clause.

I suppose something similar could be done in Fortran with coarrays,

rank = this_image()
!$omp target data map(to: rank, values(1:num_values), num_values) use_device_ptr(values)
!$omp target parallel for is_device_ptr(values)
do i = 1, num_values
   values(i) = values(i) + rank + 1
end do
values(1:num_values)[dest_rank] = values(1:num_values)
!$omp end target data

but is forbidden according to the current OpenMP 5.2 standard:

A reference to a coarray that is encountered on a non-host device must not be coindexed or appear
as an actual argument to a procedure where the corresponding dummy argument is a coarray.


I guess the biggest problem with co-arrays is lack of programmers interested in using them (which is just another a consequence of the lack of Fortran programmers in general) . This perpetuates the chicken and egg problem, that vendors don’t invest in supporting co-arrays, and vice-versa, programmers don’t bother using them, because they aren’t supported that well.

3 Likes

Wow. this is all very interesting and encourageing. I remember that there is work underway to replace openCoarrays so hopefully that will improve co-array performance on less than Cray HPC exo-scale class computers. Don’t get me wrong. I think co-arrays are a great idea. But like you say until there is better support for them, getting programmers to adopt them is going to be tough. Sadly, I think unless something happens soon co-arrays will be just another good to great idea that took so long to get reliably implemented in a critical mass of compilers that folks will have moved on to more recent programming models that can use more modern hardware. People forget co-arrays in one form or the other are around 30 years or so old. I fear that maybe its time has come and gone.

Fortran co-arrays is the only the one thing in Fortran that I think was a major mistake.

The time and manpower that went into creating coarrays, would’ve been far better utilized in improving GPU support for Fortran.

1 Like

I think this is a false choice. These are orthogonal programming models, and both can be pursued independently.

1 Like

Co-arrays go further back than you may think (History of coarrays and SPMD parallelism in Fortran | Proceedings of the ACM on Programming Languages). First co-array implementations date back to 1996. We could mark the birth of GPGPUs with the release of CUDA in 2007, and it became “mainstream” a decade later.

I’m not sure what Fortran was doing in the 2008 - 2018 period; but it looks like the whole concept of heterogeneous programming took the Fortran world by surprise. I’m not talking just CPU’s and GPUs, but also fast-cores and slow-cores, digital signal processors, tensor-cores, and even more crazy ideas being developed today.

1 Like

Off topic: If I wanted to learn how to use MPI-3 in Modern Fortran, what would be the sources you’d suggest?
Thanks

This is a problem I have all the time: if I have to build a plug-in for a closed MPI application that has a C interface, and runs on Windows, the only way to link Fortran against its MPI library is via mpif.h bindings, no useing whatsoever.

Here is a related thread. I found Dr. Fortran’s comments extremely helpful and I will quote it here:

The lesson I took from the ultimate failure of HPF is that trying to “bake in” to a language features tied closely to current hardware architectures is likely to end up being a wasted effort that becomes irrelevant over a short timeframe.

From an end-user perspective, if do concurrent+coarray can be well-supported by different compilers, we can write a single piece of code and run it on different hardwares. This is essentially the most ideal state for scientific computing, and as far as I know, Fortran is one of the languages (I could probably cross out “one of”) that excels in this regard.

2 Likes

MPI has ~99% adoption in HPC because:

  1. been around since the 1990s
  2. support for all languages, not just one
  3. performance transparency (= that generates communication turns out to be harder to reason about than function calls, because it is impossible to know what the compiler does with =)
  4. multiple high-quality implementations for all platforms, even non-HPC ones
  5. designed to support a library ecosystem (the hidden state for teams does not support library composition the way communicators do)
  6. support for essentially all standard data movement patterns, including send-recv, one-sided and collectives

Fortran coarrays, while elegant, have not been designed with the same broad goals. Like UPC and Chapel, they have small group of passionate users, but will never achieve broad adoption, if only because of the lack of support for multiple languages.

The good news is that both Intel and GCC use MPI-3 RMA as the back-end for coarrays, and both of these plus Cray’s are interoperable with MPI. The bad news is that Cray Fortran is the only situation where coarrays are capable of outperforming MPI, so within the group of HPC users who care about portable performance, MPI wins.

MPI Fortran support is dramatically improved by the mpi_f08 module. I am working on other improvements to MPI Fortran support. It’s also possible to make MPI a lot easier to use in Fortran with a wrapper like GitHub - jeffhammond/havaita: MPI Fortran type inference that automatically resolves all the unnecessary arguments.

3 Likes

Jeff, I applaud the work you are doing with vapaa, havaita, etc. It’s something thats long overdue. However, as far as I can see, you are still going to have to compile a separate version of mpi_f08 for every compiler you wish to use. Your work will hopefully eliminate the need to build the entire MPI code base for every compiler but due to the incompatibility of the internal module formats across compilers you are still stuck with building multiple copies of mpi_f08 even if its now a standalone module. Maybe your work will finally be the impetus (with hopefully a lot of nudging by the MPI distros) to stand up a standard transportable module format so you eliminate what I see as a major hurdle for a lot of libraries (not just MPI) being adopted and seamlessly integrated into Fortran projects. I’m a realist though and as I stated above doubt that will happen in my lifetime. Note my experience with distributed memory parallelism goes back to PVM running on networks of workstations (the wrong way to do HPC). I always considered PVM in someways better than MPI mostly because of its simplicity and ease of use. However, keep up the good work and maybe we can all have the kind of support for Fortran in MPI (and other projects) that it’s always deserved.

2 Likes

This question may be naive, but why is having to compile mpi_f08 for every compiler you wish to use a big problem? It is a slight nuisance that .mod files produced by compilers are incompatible. By default they are stored in the same directory as your source code, so if you are moving from one compiler to another you need to recompile everything. But compilers do allow you to specify the directories where .mod files are to be created, and you could store the mpi_f08.mod file for each compiler in a separate directory.

Because multiple copies would not be necessary which makes linking etc. a little easier. This might not be a big deal for someone with one or two compilers on a PC but its a very big deal from a software support standpoint on large HPC systems where you might have the option of say four compilers to choose from. You have to hire people (and pay them what they are worth) to keep the software up to date and insure it works on their particular system. People who just want to use a pre-built version of MPI and not mess with building and maintaining their own versions shouldn’t be forced to add a lot of boilerplate into their build systems just to make sure they are accessing a version of MPI compatible with their compiler. Note this really only applies to a certain extent if you want to use mpi_f08 (and get all the advantages that a module gives you) instead of the mpif.h include files. I also see the possibility of a standard module format being able to include things like encryption to allow the modules to be freely distributed but protect IP.

Since in a former incarnation a part of my job was maintaining software on a large Cray system I’m probably more aware of the overall cost and manpower needed to keep mission critical software libraries like MPI up to date.

A standard module format, would amount to having a standard name mangling schema, array descriptor layout, and calling conventions. I’m not a compiler guy, but in the System V ABI forum Evandro Menezes (presumably an AMD employee) wrote:

For what it’s worth, in my experience working with several FORTRAN compiler vendors to capture a minimal x86-64 ABI before, they actually did not want anything more than what is already in [the ABI]. They wanted the freedom to do all kinds of tricks in order to achieve performance, some which were proprietary, like how arrays were laid out in memory as well as references, when an ABI could be stifling to them. In the end, since FORTRAN is a much simpler language, source-code compatibility is pretty much a given and therefore binary compatibility is not so important as it is for C and C++.

C++20 introduced modules, which are in some ways comparable to Fortran ones, and the major compilers (GNU, Clang, and MSVC) use incompatible binary module formats. Not to mention that C++ modules look even more complicated than the Fortran ones, and must interact with templates and the existing preprocessor/header-based language facilities.

Maybe this is a symptom of one of the principles that Butler Lampson mentioned in his Turing Award lecture - Principles for Computer System Design,

More non-determinism is better; it allows more implementations.

I’m not defending this position, just stating this as an observation.

Also Python has multiple incompatible implementations (CPython, PyPy, IronPython, MicroPython, and others). On one hand this means the community is fragmented, but at the same time it means many people are interested, and some healthy competition can drive improvements in quality.

Rob Pike, creator of Go, also mentioned in a recent lecture titled “What We Got Right, What We Got Wrong” that having a language specification was key to Go’s success, as multiple implementations appeared relatively quickly (cmd/compile, gccgo, gollvm, gopherJS, tinygo…). (Go was created in 2009.)

2 Likes

The goal of Vapaa is to compile so fast that every app can just integrate Vapaa directly and therefore become a Fortran app that depends only on an MPI C library. And once we have a standard MPI ABI, that means Fortran apps can be built with only one MPI target.

Right now, Vapaa compiles in less than a second. Once I’m done, I’ll flatten it down to 2-3 source files, so that it goes even faster (minimizing file system access probably helps more than build parallelism on most HPC systems).

I’m busy finishing the MPI ABI standardization project but my goal is to finish both that and Vapaa this year.

As for Fortran module standardization, that’s never going to happen. It has been discussed but it’s a nonstarter. It requires breaking backwards compatibility in every compiler and getting everyone to agree on a Fortran ABI, which basically means everyone uses the same runtime library (note that C ABI consistency isn’t in the standard - it’s a consequence of Unix history and glibc dominance; try running Ubuntu binaries on Alpine some time…).

3 Likes

That makes life indeed much easier. Having proper wrappers helps to avoid wrong size or type specifications as it can be all figured out by the compiler itself. We have been using the mpifx-library in various projects for more than 10 years by now. I definitely do not want to see any “naked” MPI-calls in my code any more. :smile:

3 Likes