Running a coarray program on multiple nodes

I’ve browsed several tutorials and articles found on Fortran Wiki and elsewhere and got into confusion mentioned as the B3 question in this topic.

Some articles (e.g. Shterenlikht, A. and L. Cebamanos (2018). Cellular automata beyond 100k cores: MPI vs Fortran coarrays) shows that a coarray Fortran program can be run on multiple parallel nodes, i.e. separate, although interconnected with a fast nestwork, machines.

Now all the tutorials I have found on coarrays, as well as my naive, simple tests using gfortran and OpenCoarrays, ended up with executables suited for execution in Shared Memory environment, i.e. multiple threads running on on a single, yet multi-core, machine.

Could somebody point me to a simple cheat sheet/tutorial on how to do it on multiple nodes, without digging into some hundreds of pages MPI manuals etc.? Am I right assuming that coarrays in modern Fortran should, in principle, allow that without need for adding all MPI function calls to the code?

There is no expectation that you need to add MPI calls to your code to run Fortran program with coarrays.

However, you need to read your compiler’s documentation to find out what model of execution they support and how to setup the execution environment as those things are not specified in the Fortran standard.

Some compilers (such as NAG) only support the shared memory model. Others support both shared and distributed models, depending on a compiler switch. Some others may only support the distributed model.

Another question to ask is “can I mix MPI calls with coarray Fortran?” The intention of the Standard, I believe, is to allow this but not to require it.

This is more or less my understanding of the topic. I doubt, however, that any manual of a compiler would add practical guide to running such a distribution-model executable on multiple nodes (apart from naming the switch to generate such an executable). So what I am looking for is a 1-2 page tutorial including both the switch usage (for any available compiler) and the mechanism of actually running it on multiple nodes. Old men are lazy :slight_smile:

Here is a link to how to do this on the system I use. Quickstart for users - ARCHER2 User Documentation I create a script and then submit the job into a batch queue. I would imagine most systems will offer something similar.

If your compiler supports and you’ve enabled that feature when compiling, you should just need to run the executable with mpirun and specify the nodes you’d like to use, either in a hosts file or directly on the command line. Many super computing environments have batch/job-queuing systems that take care of the mpirun part for you, but you’ll have to read their documentation for more info on that.

Running a coarray program across multiple nodes does not, and should not, have anything to do with MPI. Normally a system intended to be used that way has a “work load manager” running. The most popular one is SLURM. Just

srun -n64 ./a.out

to run with 64 images. The Slurm software takes care of scheduling the images on the nodes.

(BTW, on a system like that, you would run a 64-rank MPI job with exactly the same command.)

2 Likes

@msz59 ,

Have you consulted the Intel Fortran documentation and their reference and how-to instructions with IFORT?
https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top.html

Well, I now have. The manual has 2700 pages only. And on page 205 one can find such an example:

 The following command runs a coarray program on shared memory using n images:
/Qcoarray /Qcoarray-num-images:n          ! Windows systems 
-coarray -coarray-num-images=n            ! Linux systems

which continues with MPI options.
This is obviously misleading if not plain wrong. These are compiler options, not runtime options, so this is not the way to run a coarray program. Not to mention that the given example is far from any command, giving just and only the options.

So, that was my reason for looking for a few pages tutorial done by someone who knows what is writing, instead of consulting 2700 pages manual written by someone who probably does not even know what the coarrays are.

Thanks to all who replied, I will try to follow your hints and - maybe - succeed in running a coarray program on multiple nodes.

Hi @msz59,
I’ve never run coarray programs in distributed (multi-node) setting, so not familiar with details, but if I remember correctly (<-- this is the point… XD), I saw a nice tutorial for that type of installation for Ubuntu clusters. But for some reason, simple Googling does not show that page for me, so I cannot get the URL for now. I guess the usage is essentially similar to just using the MPI over multiple nodes (plus compiling Fortran codes with suitable options + linking libs), specify host files (?), and use mpiexec etc to run the binary .

(For a single node + shared memory, I used just homebrew installation for Mac, which worked out of the box with the “cafrun” command. But I am also interested in trying multiple nodes / clusters.)

A few pages that Google search gave me are like these:

https://gcc.gnu.org/wiki/CoarrayLib

Thanks, @septc. Of your links, the most promising is the last one, “Intel Essential Guide…”.
NB. it gives information inconsistent with the general ifort manual mentioned by @FortranFan, which confirms my impression that the manual, at least in its coarray section was written by someone incompetent.
Compare:
General Manual:

The following command runs a coarray program on distributed memory using n images:
-coarray=distributed -coarray-num-images=n ! Linux systems

Essential Guide to Distributed Memory Coarray Fortran:

-coarray-num-images=N compiler option is ignored for -coarray=distributed. This option is only used by shared memory Coarray Fortran applications.

Fyi…

https://docs.scinet.utoronto.ca/index.php/Co-array_Fortran_on_Niagara