Parallel Algorithms course

A course in Parallel Algorithms taught by

  • Martin van Gijzen (TU Delft)
  • Jonas Thies (TU Delft)

open to students at Dutch universities is underway.

We will use (modern) Fortran in class, since this language is very popular in high performance scientific and data-intensive computing, and has parallel computing instructions included in the language. We will organise the course in such a way that there will be time to learn Fortran for those coming from another language. Some experience with the Linux operating system is helpful.

Some course materials are at

A book used in the course is Parallel Scientific Computation: A Structured Approach Using BSP (Second Edition), by Rob H. Bisseling, Oxford University Press (2020). The site has slides and videos.

4 Likes

Cool find! It looks like they will be using OpenCoarrays in the course.

The book by Bisseling looks like a great resource for coarray programmers in general. I haven’t heard of BSP yet which stands for Bulk Synchronous Parallel programming. The book comes with examples in C that make use of BSPlib: The BSP programming library:

BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full definition of BSPlib in C, motivates the design of its basic operations, and gives examples of their use. The library enables programming in two distinct styles: direct remote memory access (DRMA) using put or get operations, and bulk synchronous message passing (BSMP). Currently, implementations of BSPlib exist for a variety of modern architectures, including massively parallel computers with distributed memory, shared memory multiprocessors, and networks of workstations. BSPlib has been used in several scientific and industrial applications; this paper briefly describes applications in benchmarking, Fast Fourier Transforms (FFTs), sorting, and molecular dynamics.

I’ve found a couple different implementations of the library:

The BSP Worldwide has some F77 examples (which are actually Fortran 90).


One of the introductory slide decks from Bisseling states:

An abstract BSP machine is just a BSP(p,r,g,l) computer. This is all we need to know about the machine for developing algorithms. The parameters are:

  • p number of processors
  • r computing rate (in flop/s)
  • g communication cost per data word (in flop time units)
  • l global synchronization cost (in flop time units)

The BSP model consists of

  • a distributed-memory architecture with a black box communication network providing uniform-time access to remote memories;
  • an algorithmic framework formed by a sequence of supersteps;
  • a cost model giving cost expressions of the form a + bg + cl.

The book on Parallel Programming with Co-arrays by Robert W. Numrich offers a very similar model for analyzing parallel algorithms and goes one step further by using dimensional analysis to study algorithms in terms of hardware and software “forces”. I find it surprising the works by Numrich don’t refer to any BSP-related works:

1 Like

Perhaps he wasn’t aware of BSP when he wrote the book?

Bob Numrich invented what has become co-arrays, back in Cray T3D days. He wanted the lowest latency and fastest communication possible given the hardware.
So he wrote a very small library to twiddle with the T3D “DTB annex” hardware. Combined with stuffing annex references in higher order address bits using Cray pointers allowed for direct read/write of data in remote memories for lowest latency communications. He called it f--. It worked really well. (I was one of his guinea pigs.)

The Cray shmem library was then written to provide much of the one-sided communication capability in a more machine independent manner. Then MPI picked up the concept in MPI 2.0. Finally, years later, Fortran 2008 got co-arrays - which avoids many of the procedure calls.

1 Like

Probably. It is also vice-versa, the BSP literature doesn’t appear aware of Numrich’s ideas or really expose Fortran co-arrays as an implementation of the BSP computation model. The Wikipedia pages on BSP and CAF don’t reference each other, although there is clearly some conceptual overlap.

Fortran is mentioned in some recent works: Bulk: A Modern C++ Interface for Bulk-Synchronous Parallel Programs | SpringerLink

When restricting ourselves to communication based on distributed variables, we lose the possibility of performing communication based on (sub)arrays. Distributed variables whose images are arrays have a special status in Bulk, and are captured in coarray objects. The functionality of these objects is inspired by Coarray Fortran [18]. Coarrays provide a convenient way to share data across processors. Instead of manually sending and receiving individual data elements, coarrays model distributed data as a 2D array, where the first dimension is over the processors, and the second dimension is over local 1D array indices.

The Bulk library offers a similar coarray abstraction as Fortran does:

bulk::thread::environment env;
env.spawn(env.available_processors(), [](auto& world) {
    auto s = world.rank();
    auto p = world.active_processors();

    world.log("Hello world from processor %d / %d!", s, p);
});

In CAF:

program hello_world
integer :: s, p
s = this_image()
p = num_images()
print '("Hello world from image",I0," / ",I0,"!")', s, p
end program

Edit: these snippets don’t include any co-array syntax, just the execution model of N copies of the program with distinct memory /address spaces.