I was looking at updating to the most recent oneAPI HPC toolkit and I noticed that it now includes a SHMEM distribution that “Implements Partitioned Global Address Space(PGAS) programming for host-initiated and device-initiated operations”. In my experience co-arrays really need something like PGAS to be effective. So can the ifx in the latest oneAPI make use of SHMEM (and bypass MPI) or is ifx still stuck with using MPI as the transport layer (something I’ve always thought was a very bad idea). One reason I’ve avoided trying to do anything with co-arrays is my attempts to run it on say an 8 or 16 core workstation have been less than successful. Co-arrays really need something like SHMEM to reach its maximum potential. Also, if I remember correctly some of the first Cray MPI implemenations (on the T3D I think) were built on top of SHMEM. Can the Intel MPI distribution also make use of SHMEM for multi-core workstations so it doesn’t default to using TCP/IP or shared memory puts and gets. Anyone have any experience with ifx and Intel SHMEM.
As an aside, the T3D was kind of an experimental machine that had several different hardware ways of communicating data between nodes. The designers weren’t sure which would eventually win out for the T3E.
Initially the T3D was supposed to use the Cray-developed CRAFT programming model, which gave the programmer a view of the world that was somewhat like shared memory programming. It also supported message passing via PVM. (MPI came later.)
The shmem library was a generalization of Bob Numrich’s ‘f–’ library - a little side-project designed to take advantage of hardware that allowed the processor in one node to directly address memory in a different node. (Used a bit of T3D hardware called the “DTB Annex” to perform this feat.) Zero message passing overhead.
Unfortunately things got a bit political, because the powers that be didn’t want yet another parallel model to support. But the performance of what became the one-sided model won out, and lots of people started using shmem. And CRAFT died out pretty quickly.
Eventually MPI 1 was released and message passing people started using it instead of PVM. MPI 2 introduced one-sided communication, and more or less removed the need for shmem.
Oh, and the T3E ended up using yet a different hardware communications method than any of the ways the T3D supported.
I have a warm place in my heart for PVM. Much of my Ph.D. research was built around making a CFD code parallel using PVM. I was using it as a network of workstations “virtual” distributed system using workstations spread out over several university departments and labs. Worked great until someone on one particular system in the CS department decided to run a large ray tracing program for a computer graphics project that basically shut me out of that particular node. I spent about as much time figuring out a strategy for moving a stalled process to a different system and maintain load balance as I did actually solving the Euler equations. Good Times. There is still a lot about PVM that I liked a lot more than MPI. It had a much simpler and easier to learn and use set of routines but unfortunately MPI took over the world and all development of PVM stopped.
I didn’t use PVM very much. However I was one of Bob Numrichs early “guinea pigs” with his f-- library and technique on the T3D. The shmem library was actually a bit of a downgrade, as there was subroutine call overhead vs simply directly accessing remote data in ones expressions and assignments. This is coarrays superpower as well - given appropriate hardware and software implementation.