Coarrays: control over the number of images

I am experimenting with coarrays for an actual application, rather than getting a feeling for what you can do with them, and I am facing a particular question regarding the number of images:

For some algorithms the number of images merely means that the work is divided over whatever number of images is available. In this case I want each image to take care of a single subregion in my complete model region. But these subregions are determined outside my particular program.

If I start my program using all defaults, the number of images may be 24, whereas I only need 3.
I can control the number via environment variables or via compile time options, but what do I do with extra images, like when I forget to specify that my program needs only 3 images for the case at hand and I still get 24? Should I stop the remaining 21 or make sure they occupy very little memory and do not do anything except wait?

What is a “good” way to handle this situation?

A naive suggestion could be to abort the program:

if (num_images() /= 3) error stop "abort: more images than necessary."

This would force the calling process (or person that invoked it) to adjust the number of images beforehand.

Another idea would be to group your images into teams, and have the other team do something else at the time. I don’t know if stopping one team would “return” them into the pool of available system resources, or if the members of that team would wait in front of an (implicit) barrier at the end program/stop statement until the rest of images are complete.

I thought of that naĂŻve suggestion, but that assumes that the user starts the program themselves (interactively) and knows how to set the number of images. It is definitely a possibility and simple.

I had not thought about teams though, it is worthwhile to explore that possibility!

Using environment variables you can in principle fine-tune the process pinning: https://www.intel.com/content/www/us/en/developer/articles/technical/distributed-memory-coarray-fortran-with-the-intel-fortran-compiler-for-linux-essential.html

I guess you’ll need to expose these “levers” in a wrapper shell script, that doesn’t involve the user knowing anything about CAF or the specific implementation settings.

Example of a wrapper script

:warning: This code was generated by ChatGPT and has not been tested. :warning:

Save the block below as wrapper.sh. Assumes the CAF executable was created using the Intel Fortran compiler

#!/bin/bash

# Default value for --num_images
NUM_IMAGES=3

# Parse arguments
for arg in "$@"; do
    if [[ $arg == --num_images=* ]]; then
        NUM_IMAGES="${arg#*=}"
    else
        OTHER_ARGS+="$arg "
    fi
done

# Check if NUM_IMAGES is a valid number and greater than or equal to 3
if ((NUM_IMAGES < 3)); then
    echo "Error: --num_images cannot be less than 3." >&2
    exit 1
fi

# Set the environment variable and run ./caf_fortran with the other arguments
FOR_COARRAY_NUM_IMAGES="$NUM_IMAGES" ./caf_fortran $OTHER_ARGS

We use all manner of scripts, so that is not a big deal. But I do want to hide switches that do not really matter to the user.

From what I understood, your application has a kind of domain decomposition approach to dealing with parallelism which is divided in a pre-processing step and then your application which would be the “processing step” right? If so, I guess that looking at this in the same manner as you would an MPI application would be the safest. So, better to stick with ensuring that the pre- and processing steps are setup with the same amount of images (processes in the MPI jargon). If you do not want the user to mingle with this setting for your processing step, could you insert a scripting approach that detects how many domains were used for the decomposition before running your program? and then pass that to cafrun … this is at least what I would try to keep some sanity in my mind :slight_smile:

Teams actually seem like a pretty convenient way to address this problem, as the rest of the code doesn’t need to worry about getting the unused images to participate in any collective operations. I.e. some psuedo-code

I_participate = participates(this_image())
team_number = merge(1, 2, I_participate)
form team (team_number, my_team)
if (I_participate) then
  change team (my_team)
    call do_the_work
  end team
end if
end
2 Likes

Yes, it is a domain-decomposition problem. And as Brad illustrated, using teams is probably the most convenient way. Thanks, everyone!

Very curious to know about your experience down the road with coarray and domain decomposition :slight_smile: … So far I’ve just played around with coarray but haven’t used it seriously.

Well, my first attempts worked fine, but they were very simple. I am now experimenting with a program that actually reads the partitioning (and the connections between the partitions) from input files. That will be close to the actual application and I can certainly post about this.

5 Likes

I have made progress:

  • The program works, in the sense that the results look plausible
  • I have for now avoided the use of teams. Instead, it demands the right number of images.
  • I want to write down my experiences, but that will take a wee bit of time :slight_smile:

Of course my test case/demo is far too small and simple to see whether the performance has improved, but that is of secondary importance.

1 Like

Actually, adding teams to avoid setting the number of images to use was quite easy.

FYI: I have described my experiment here: memos-on-programming/doc at main · arjenmarkus/memos-on-programming · GitHub. The repository also contains the program itself and the input files.

3 Likes

Meanwhile I used some procrastination to create a version of the program that uses MPI instead of coarrays. That took me a lot more thinking, partly to understand the MPI functions/subroutines, partly because there is a confusing aspect in my example, but I blame it partly on MPI itself. Anyway, to my surprise it gives exactly the same results as the coarray version. Which I chose to take to be a good omen.