Highest Scaling Codes on JUQUEEN

Following up on our JUQUEEN porting and scaling workshop and to promote the idea of exascale capability computing, we have established a showcase for codes that can utilise the entire 28-rack BlueGene/Q system at JSC. We want to encourage other developers to invest in tuning and scaling their codes and show that they are capable of using all 458,752 cores, aiming at more than 1 million concurrent threads on JUQUEEN.

The diverse membership of the High-Q Club shows that it is possible to scale to the complete JUQUEEN using a variety of programming languages and parallelisation models, demonstrating individual approaches to reach that goal. High-Q status marks an important milestone in application development towards future HPC systems that envisage even higher core counts.

JUQUEEN was a super-computer operated by the Jülich Research Centre and running 458.752 IBM PowerPC-A2-Cores. JUQUEEN was shut off in 2018 and is now replaced by JUWELS which can reach about 85 petaFLOPS (equivalent to about 300000 modern PCs).

Out of the 32 scientific code capable of scaling to all cores of the JUQUEEN, the following are Fortran codes:

- CIAO - Compressible/Incompressible Advanced reactive turbulent simulations with Overset
- FEMPAR - Massively parallel finite element simulation of multiphysics problems
- JuSPIC - Jülich Scalable Particle-in-Cell code
- KKRnano - Korringa-Kohn-Rostoker Green function code for quantum description of nano-materials
- MP2C - Massively Parallel Multi-Particle Collision Dynamics
- Musubi - A multicomponent Lattice Boltzmann solver for flow simulations
- OpenTBL - Direct numerical simulation of turbulent flows
- PP-Code - Simulations of relativistic and non-relativistic astrophysical plasmas

There are also several mixed-language codes with varying amounts of Fortran/C/C++:

- Code_Saturne - A multiphysics CFD software
- GYSELA - GYrokinetic SEmi-LAgrangian code for plasma tubulence simulations
- ICON (Icosahedral non-hydrostatic) general circulation model
- MPAS-A - Model for Prediction Across Scales – Atmospheric core
- ParFlow+p4est
- PEPC - Pretty Efficient Parallel Coulomb Solver
- PMG and PFASST - a space-time parallel multilevel solver
- psOpen - Direct numerical simulation of fine-scale turbulence
- Seven-League Hydro Code (SLH)
- TERRA-NEO - Integrated Co-Design of an Exa-Scale Earth Mantle Modeling Framework

To summarize: more than half of the codes (18 out of 32) used Fortran. The remaining codes in the High-Q Club are either C or C++ codes (no Julia).

On the new JUWELS super-computer the majority of computing power are GPU nodes (224 NVIDIA V100, and 3744 NVIDIA A100). Most of the codes on the old computer were parallelized using OpenMPI, OpenMP, or pthreads. Seeing this it kind of makes sense that OpenMP is already adapting by adding GPU directives.