MFC: Performant Multiphase Flow Simulation at Leadership-Class scale using OpenACC

ivanpribec · March 19, 2024, 10:08pm

From the YouTube video description published on the OpenACC channel:

Multiphase compressible flow simulations are often characterized by large grids and small time steps, thus conducting meaningful simulations on CPU-based clusters can take several wall-clock days. Accelerating the corresponding kernels via GPUs appears attractive but is memory-bound for standard finite-volume and -difference methods, damping speed-ups. Even if realized, faster GPU-based kernels can make communication and I/O times prohibitive.

This webinar focuses on a portable strategy for GPU acceleration of multiphase and compressible flow solvers that addresses these challenges and obtains large speedups at scale. Employing a trio of approaches—OpenACC for offloading, Fypp to reveal hidden compile-time optimizations, and NVIDIA CUDA-aware MPI for remote direct memory access—enables the efficient use of the latest leadership-class systems.

Spencer Bryngelson, assistant professor from Georgia Institute of Technology, discusses how his team implemented this approach in the open-source solver MFC (https://mflowcode.github.io) to achieve 46% of peak FLOPs and high arithmetic intensity for the most expensive simulation kernels. In representative simulations, a single NVIDIA A100 GPU is 300 times faster than an Intel Xeon Cascade Lake CPU core. At the same time, near-ideal (within 3%) weak scaling is observed for at least 13824 V100 GPUs on Oak Ridge National Laboratory’s supercomputer, Summit. 84% strong scaling efficiency is retained for an 8-times increase in GPU count. Large multi-GPU simulations demonstrate the practical utility of this strategy.

By viewing this webinar, participants can expect to:

Learn about the capabilities and limitations of directive-based GPU offloading of stencil-based, nonlinear PDE solvers;
Explore how to relax the stress and slowdown introduced by high levels of abstraction in modern codebases;
Understand the practical tradeoffs between low- and high-level GPU offloading; and
Hear the latest observations for using directive-based offloading on AMD and Intel GPU-based supercomputers.

Topic		Replies	Views
Performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia	5	882	August 30, 2021
NVIDIA GTC 2021 - Inside NVC++ and NVFORTRAN Announcements	1	729	April 12, 2021
FOSDEM '21 (Feb. 6 & 7)	1	419	February 5, 2021
GPU Programming Model vs. Vendor Compatibility Overview (preprint)	1	489	September 13, 2023
Porting to GPU a Fortran code using intrinsic parallelism (webinar)	0	416	July 3, 2023

MFC: Performant Multiphase Flow Simulation at Leadership-Class scale using OpenACC

Related topics