Abstract
This work tackles the performance and energy consumption analysis of a
legacy scientific application, the VASP (Vienna Ab-initio Simulation
Package), an application commonly used by physicists and chemists for
modeling materials at the atomic scale. Many of these scientific
applications have been implemented in Fortran, where energy metrics
instrumentation is not straightforward. We obtained performance
figures (execution time and energy consumption) by instrumenting the
source code using EML. This energy measurement library has been
modified to introduce Fortran interfaces for these metrics. The
analysis was carried out using different matrix algebra libraries,
parallelization techniques, and hardware platforms, emphasizing on the
MPI, OpenMP, and CUDA parallel implementations of the algorithms used
in VASP. We employ various material specifications (atomic structures)
and molecular sizes of a silicon-based crystal to create a set of
benchmarks for these specifications, leading to some recommendations
for final users regarding performance improvements. The proposed
benchmarking technique assists the user in selecting the right
combination of problem size, compilers, and parallelization options
available in VASP. For a given system platform, the user will be able
to determine not only the architecture to use (GPU or multicore
processors), but also the appropriate library and parallelization
according to the atomic structure and molecular size.
Yes, I noticed this but did not comment since I have never used VASP. There is a March 19, 2024 update with new functionality, so VASP is actively developed, not just a “legacy” program.
We utilized the Hydrological Simulation Program-Fortran (HSPF) model to simulate peak flow and flood volume and then used these data as inputs for the Environmental Fluid Dynamics Code (EFDC) hydrodynamic model to simulate the spatial extent and depth of flood inundation.
An updated version of PSCToolkit for solving sparse linear systems with 8192 NVIDIA GPUs is released by its developers. They showed good weak scalability, and considered up to 6.5 x 10^10 degrees of freedom.
I wonder why the authors of the paper below (not freely available) are getting worse results for their Fortran code than their Rust code. Some possibilities are
They are using default reals (single precision) in their Fortran code and double precision in the Rust code or doing something else to make the programs not comparable.
There is a bug in the Fortran compiler (unlikely).
There are features of the Rust language without counterparts in Fortran that enable simulations to be done with less error. I am unfamiliar with Rust.
Ideally a journal referee would have asked the authors to send their codes and tried to run them. I wonder how often that happens for computationally-oriented papers.
Abstract
The article focuses on developing a probabilistic scheme for the ingression of aerosol particles in the different regions of the human lung. The methodology adopted was based on the Monte Carlo technique, which was programmed using the Rust programming language. Around seven samples with different inspiratory capacities were examined using a similar methodology. The total regional deposition obtained through the Rust compiler was compared with corresponding solutions derived from Fortran. Relatively, the solution obtained through Fortran exhibits extreme variabilities while estimating the total regional deposition in the lungs. The results are negative skewness for all the samples. A wide range of variabilities was encountered while computing the total regional deposition fraction at different inspiratory capacities. The reliability of the Fortran compiler varied from 60.65 to 90.48% for every 10 events. The uncertainty in total regional deposition at higher inspiratory capacities was relatively high in the Rust version. There is no definite stochastic pattern at the Tracheo bronchial region observed for larger aerosol particles with the change in inspiratory capacities of some subjects. The rise in the inspiratory capacities of the subjects increases the probability of deposition of smaller aerosol particles to sediment in the Alveolar region. In some cases, the bimodal probability distribution pattern was noticed for the total regional deposition of aerosol particles. In addition, a wide range of extreme deviations was also observed in the solution derived from the Fortran version. The results obtained through the adopted methodology exhibited statistical significance in the context of the variation of aerosol size particles and their regional deposition in the human lungs.
Why should the referee(s) the only ones interested in the source code? Where is a specific note about the compilers, and their flags since the promise
the information about the compiler is provided in the supplementary material (S.1).
(on page 3/15, right hand column) doesn’t materialize there. Or does the paper link to a public repository (by the school, zenodoo, GitHub, etc.) to have a look how the idea was implemented in the source code? Apparently, the journal/the founding agency missed an opportunity for FAIRness of the publication. By recollection, at least the first third of a talk by Misshula at the NYC Emacs group (link to the recording in 2014 here) illustrated this frequent problem in science with some examples. (A sloppy SI how a synthesis was performed/a material was isolated and characterized is an equivalent in chemistry.)
Quantitative Precipitation Estimates (QPE) obtained from satellite data are essential for accurately assessing the hydrological cycle in both land and ocean. Early artificial Neural Network (NN) methods were used previously either to merge infrared and microwave data or to derive better precipitation products from radar and radiometer measurements. Over the last 25 years, machine learning technology has advanced significantly, accompanied by the initiation of new satellites, such as the Global Precipitation Measurement Mission Core Observatory (GPM-CO). In addition, computing power has increased exponentially since the beginning of the 21st century. This paper compares the performance of a pure NN FORTRAN, originally designed to expedite the 2A12 TRMM (Tropical Rainfall Measuring Mission) algorithm, with a contemporary state-of-the-art NN in Python using the TensorFlow library (NN PYTHON). The performance of FORTRAN and Python approaches to QPE using GPM-CO data are compared with the goal of achieving a minimum NN architecture that at least matches the outcome of the Goddard Profiling Algorithm (GPROF) algorithm. The results indicate that NNs can simulate the GPROF. Another conclusion is that the new NN PYTHON does not present significant advantages over the old FORTRAN code. The latter does not require dependencies, which has many practical advantages in operational use and therefore have an edge over more complex approaches in hydrometeorology.
This study presents scaling results and a performance analysis across different supercomputers and compilers for the Met Office weather and climate model, LFRic. The model is shown to scale to large numbers of nodes which meets the design criteria, that of exploitation of parallelism to achieve good scaling. The model is written in a Domain-Specific Language, embedded in modern Fortran and uses a Domain-Specific Compiler, PSyclone, to generate the parallel code. The performance analysis shows the effect of choice of algorithm, such as redundant computation and scaling with OpenMP threads. The analysis can be used to motivate a discussion of future work to improve the OpenMP performance of other parts of the code. Finally, an analysis of the performance tuning of the I/O server, XIOS is presented.
Compilers from Cray, Intel, and the GNU project are used.
by James McKevitt, Eduard I. Vorobyov, Igor Kulikov
January 2025
Fortran’s prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. Coarray Fortran (CAF), part of the Fortran 2008 standard introduced for parallel programming, facilitates distributed memory parallelism with a syntax familiar to Fortran programmers, simplifying the transition from single-processor to multi-processor coding. This research focuses on innovating and refining a parallel programming methodology that fuses the strengths of Intel Coarray Fortran, Nvidia CUDA Fortran, and OpenMP for distributed memory parallelism, high-speed GPU acceleration and shared memory parallelism respectively. We consider the management of pageable and pinned memory, CPU-GPU affinity in NUMA multiprocessors, and robust compiler interfacing with speed optimisation. We demonstrate our method through its application to a parallelised Poisson solver and compare the methodology, implementation, and scaling performance to that of the Message Passing Interface (MPI), finding CAF offers similar speeds with easier implementation. For new codes, this approach offers a faster route to optimised parallel computing. For legacy codes, it eases the transition to parallel computing, allowing their transformation into scalable, high-performance computing applications without the need for extensive re-design or additional syntax