MFC: Performant Multiphase Flow Simulation at Leadership-Class scale using OpenACC

From the YouTube video description published on the OpenACC channel:

Multiphase compressible flow simulations are often characterized by large grids and small time steps, thus conducting meaningful simulations on CPU-based clusters can take several wall-clock days. Accelerating the corresponding kernels via GPUs appears attractive but is memory-bound for standard finite-volume and -difference methods, damping speed-ups. Even if realized, faster GPU-based kernels can make communication and I/O times prohibitive.

This webinar focuses on a portable strategy for GPU acceleration of multiphase and compressible flow solvers that addresses these challenges and obtains large speedups at scale. Employing a trio of approaches—OpenACC for offloading, Fypp to reveal hidden compile-time optimizations, and NVIDIA CUDA-aware MPI for remote direct memory access—enables the efficient use of the latest leadership-class systems.

Spencer Bryngelson, assistant professor from Georgia Institute of Technology, discusses how his team implemented this approach in the open-source solver MFC (https://mflowcode.github.io) to achieve 46% of peak FLOPs and high arithmetic intensity for the most expensive simulation kernels. In representative simulations, a single NVIDIA A100 GPU is 300 times faster than an Intel Xeon Cascade Lake CPU core. At the same time, near-ideal (within 3%) weak scaling is observed for at least 13824 V100 GPUs on Oak Ridge National Laboratory’s supercomputer, Summit. 84% strong scaling efficiency is retained for an 8-times increase in GPU count. Large multi-GPU simulations demonstrate the practical utility of this strategy.

By viewing this webinar, participants can expect to:

  1. Learn about the capabilities and limitations of directive-based GPU offloading of stencil-based, nonlinear PDE solvers;
  2. Explore how to relax the stress and slowdown introduced by high levels of abstraction in modern codebases;
  3. Understand the practical tradeoffs between low- and high-level GPU offloading; and
  4. Hear the latest observations for using directive-based offloading on AMD and Intel GPU-based supercomputers.
6 Likes