High-Q Club at Jülich Research Centre


Highest Scaling Codes on JUQUEEN

Following up on our JUQUEEN porting and scaling workshop and to promote the idea of exascale capability computing, we have established a showcase for codes that can utilise the entire 28-rack BlueGene/Q system at JSC. We want to encourage other developers to invest in tuning and scaling their codes and show that they are capable of using all 458,752 cores, aiming at more than 1 million concurrent threads on JUQUEEN.
The diverse membership of the High-Q Club shows that it is possible to scale to the complete JUQUEEN using a variety of programming languages and parallelisation models, demonstrating individual approaches to reach that goal. High-Q status marks an important milestone in application development towards future HPC systems that envisage even higher core counts.

JUQUEEN was a super-computer operated by the Jülich Research Centre and running 458.752 IBM PowerPC-A2-Cores. JUQUEEN was shut off in 2018 and is now replaced by JUWELS which can reach about 85 petaFLOPS (equivalent to about 300000 modern PCs).

Out of the 32 scientific code capable of scaling to all cores of the JUQUEEN, the following are Fortran codes:

There are also several mixed-language codes with varying amounts of Fortran/C/C++:

To summarize: more than half of the codes (18 out of 32) used Fortran. The remaining codes in the High-Q Club are either C or C++ codes (no Julia).

On the new JUWELS super-computer the majority of computing power are GPU nodes (224 NVIDIA V100, and 3744 NVIDIA A100). Most of the codes on the old computer were parallelized using OpenMPI, OpenMP, or pthreads. Seeing this it kind of makes sense that OpenMP is already adapting by adding GPU directives.


A scientific report is published in

The High-Q Club: Experience with Extreme-scaling Application Codes | Brömmel | Supercomputing Frontiers and Innovations

1 Like

would be nice how many are actually used for science and how many are proof of concept/co-design exercises tera/peta/exa/zetta/yotta scales

My default guess would be that all of them are for science. I don’t know any scientists who actually care about these SI-prefix milestones. We write fast codes when the problems we want to solve are too big to do otherwise. The hype around these milestones is mainly a PR game between computer vendors, funding agencies, and research centers. As a side benefit, some scientists get to have their software development work taken seriously, which we all know is not the norm in most fields.


I have seen the tradition of writing codebases among computer scientists with the primary goal of breaking the records. The code runs massively parallel and wins the xxx prize for achieving extraordinary heterogeneous parallel performance, but does not solve a real research problem. I don’t regard it negatively here, only expressing the fact that some do primarily care about performance, more than science. Those such projects that I know of, are in C++.


Sorry for a very very very lazy question.
So it looks like modern Fortran can now use openMP on GPU, which is very interesting!
Is there some sample code of such Fortran + GPU technique? Thanks!