Research articles using Fortran

I follow three textbooks in my area and they all use Fortran.

  1. The first textbook in the preface says:

The book also includes some basic numerical algorithms –
accompanied by corresponding Fortran codes on the Wiley website for this book –
that students can use as templates with which to practice and develop their own
phase field codes.

  1. The second is the Japanese book with Fortran codes in the book. Their website also has open-source Fortran codes.
  1. The Korean textbook contains Fortran codes too.
2 Likes

Apparently real*8 is common in ORNL:

1 Like

[Submitted to arXiv on 30 Aug 2023]

High Performance GPU Accelerated MuST Software

by Xiao Liang, Edward Hanna, Derek Simmel, Hang Liu, Yang Wang

The MuST package is a computational software designed for ab initio electronic structure calculations for solids. The Locally Self-consistent Multiple Scattering (LSMS) method implemented in MuST allows to perform the electronic structure calculation for systems with a large number of atoms per unit cell. For the LSMS method with muffin-tin potential approximation, the major computational challenge is the matrix inverse for the scattering matrix calculation, which could take more than 90% of the computing time. However, the matrix inverse can be significantly accelerated by modern graphical-processing-units (GPUs). In this paper, we discuss our approach to the code acceleration by offloading the matrix inverse tasks to the GPUs through a Fortran-C interface from the Fortran code to the CUDA code. We report our performance results showing significant speedup ratio achieved to the calculations of NiAu alloy, a candidate for thermoelectric materials.

Subjects: Computational Physics (physics.comp-ph)

2 Likes

has associated preprint and repo GitHub - nekStab/LightKrylov: Lightweight implementation of Krylov subspace techniques in Fortran.

2 Likes

A tool and a methodology to use macros for abstracting variations in code for different computational demands
by A. Dubey, Y. Lee, T. Klosterman, and E. Vatai
Future Generation Computer Systems
Available online 18 July 2023

Abstract:
Scientific software used on high-performance computing platforms is in a phase of transformation because of the combined increase in the heterogeneity and complexity of models and hardware platforms. Having separate implementations for different platforms can easily lead to combinatorial explosions; therefore, the computational science community has been looking for mechanisms to express code through abstractions that can be specialized for different platforms. Most existing approaches use template meta-programming in C++, and are, therefore language specific. We have developed a tool that uses customized expansion of macros to mimic some of C++ behaviour in other languages. It enables unification of any code variants that may be necessary to run efficiently on different target architectures and different computational environments through use of macros with multiple alternative definitions and ability to arbitrate on definition selection for expansion. Combined with two other tools, a custom runtime, and a user specified recipe translator, our custom macroprocessor becomes a part of an overall performance portability solution that does not depend on any specific programming language. We also use macros as code-shorthand that lets code snippets become building blocks that allow variations in control flow to explore performance options. We demonstrate use of macros in Flash-X, a multiphysics multicomponent code with many Fortran legacy components derived from an earlier community code FLASH.

Press release here, GitHub channel for Flash-X here.

1 Like

Thanks for sharing this. In the past several macro-processors were used in combination with Fortran, but the language eventually evolved making them obsolete.

It would be interesting to learn if the future preprocessor (cc @gak) and generics (cc @everythingfunctional) facilities could have addressed the needs of the Argonne users.

Yes, @ivanpribec, I’ll throw this on my reading queue for requirements.

1 Like

Here are three recent preprints from arXiv, the last two having some authors in common.

Algorithm xxxx: HiPPIS A High-Order Positivity-Preserving Mapping Software for Structured Meshes
by Timbwaoga A. J. Ouermi, Robert M Kirby, and Martin Berzins
arXiv 13 Oct 2023
GitHub: HiPPIS

Abstract:
Polynomial interpolation is an important component of many
computational problems. In several of these computational problems,
failure to preserve positivity when using polynomials to approximate
or map data values between meshes can lead to negative unphysical
quantities. Currently, most polynomial-based methods for enforcing
positivity are based on splines and polynomial rescaling. The
spline-based approaches build interpolants that are positive over the
intervals in which they are defined and may require solving a
minimization problem and/or system of equations. The linear polynomial
rescaling methods allow for high-degree polynomials but enforce
positivity only at limited locations (e.g., quadrature nodes). This
work introduces open-source software (HiPPIS) for high-order
data-bounded interpolation (DBI) and positivity-preserving
interpolation (PPI) that addresses the limitations of both the spline
and polynomial rescaling methods. HiPPIS is suitable for approximating
and mapping physical quantities such as mass, density, and
concentration between meshes while preserving positivity. This work
provides Fortran and Matlab implementations of the DBI and PPI
methods, presents an analysis of the mapping error in the context of
PDEs, and uses several 1D and 2D numerical examples to demonstrate the
benefits and limitations of HiPPIS.

Stencil-HMLS: A multi-layered approach to the automatic optimisation
of stencil codes on FPGA

by Gabriel Rodriguez-Canal, Nick Brown, Maurice Jamieson, Emilien
Bauer, Anton Lydike, Tobias Grosser
arXiv 3 Oct 2023

Abstract:
The challenges associated with effectively programming FPGAs have
been a major blocker in popularising reconfigurable architectures for
HPC workloads. However new compiler technologies, such as MLIR, are
providing new capabilities which potentially deliver the ability to
extract domain specific information and drive automatic structuring of
codes for FPGAs.

In this paper we explore domain specific optimisations for
stencils, a fundamental access pattern in scientific computing, to
obtain high performance on FPGAs via automated code structuring. We
propose Stencil-HMLS, a multi-layered approach to automatic
optimisation of stencil codes and introduce the HLS dialect, which
brings FPGA programming into the MLIR ecosystem. Using the PSyclone
Fortran DSL, we demonstrate an improvement of 14-100× with respect to
the next best performant state-of-the-art tool. Furthermore, our
approach is 14 to 92 times more energy efficient than the next most
energy efficient approach.

Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang
by Nick Brown, Maurice Jamieson, Anton Lydike, Emilien Bauer, and Tobias Grosser

Abstract:
MLIR has become popular since it was open sourced in 2019. A
sub-project of LLVM, the flexibility provided by MLIR to represent
Intermediate Representations (IR) as dialects at different abstraction
levels, to mix these, and to leverage transformations between dialects
provides opportunities for automated program optimisation and
parallelisation. In addition to general purpose compilers built upon
MLIR, domain specific abstractions have also been developed.

In this paper we explore complimenting the Flang MLIR general purpose
compiler by combining with the domain specific Open Earth Compiler’s
MLIR stencil dialect. Developing transformations to discover and
extracts stencils from Fortran, this specialisation delivers between a
2 and 10 times performance improvement for our benchmarks on a Cray
supercomputer compared to using Flang alone. Furthermore, by
leveraging existing MLIR transformations we develop an
auto-parallelisation approach targeting multi-threaded and distributed
memory parallelism, and optimised execution on GPUs, without any
modifications to the serial Fortran source code.

1 Like

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives

by Dipayan Datta and Mark S. Gordon

https://doi.org/10.1021/acs.jctc.3c00876

An algorithm is presented for the coupled-cluster singles, doubles, and perturbative triples correction [CCSD(T)] method based on the density fitting or the resolution-of-the-identity (RI) approximation for performing calculations on heterogeneous computing platforms composed of multicore CPUs and graphics processing units (GPUs). The directive-based approach to GPU offloading offered by the OpenMP application programming interface has been employed to adapt the most compute-intensive terms in the RI-CCSD amplitude equations with computational costs scaling as 𝒪(𝑁2O𝑁4V), 𝒪(𝑁3O𝑁3V), and 𝒪(𝑁4O𝑁2V) (where NO and NV denote the numbers of correlated occupied and virtual orbitals, respectively) and the perturbative triples correction to execute on GPU architectures. The pertinent tensor contractions are performed using an accelerated math library such as cuBLAS or hipBLAS. Optimal strategies are discussed for splitting large data arrays into tiles to fit them into the relatively small memory space of the GPUs, while also minimizing the low-bandwidth CPU–GPU data transfers. The performance of the hybrid CPU–GPU RI-CCSD(T) code is demonstrated on pre-exascale supercomputers composed of heterogeneous nodes equipped with NVIDIA Tesla V100 and A100 GPUs and on the world’s first exascale supercomputer named “Frontier”, the nodes of which consist of AMD MI250X GPUs. Speedups within the range 4–8× relative to the recently reported CPU-only algorithm are obtained for the GPU-offloaded terms in the RI-CCSD amplitude equations. Applications to polycyclic aromatic hydrocarbons containing 16–66 carbon atoms demonstrate that the acceleration of the hybrid CPU–GPU code for the perturbative triples correction relative to the CPU-only code increases with the molecule size, attaining a speedup of 5.7× for the largest circumovalene molecule (C66H20). The GPU-offloaded code enables the computation of the perturbative triples correction for the C60 molecule using the cc-pVDZ/aug-cc-pVTZ-RI basis sets in 7 min on Frontier when using 12,288 AMD GPUs with a parallel efficiency of 83.1%.

Impressive for this work is that the GPU portation is archived with OpenMP offloading directives only, no Cuda, HIP or DPC++, to support three different GPU vendors.

7 Likes

Thread-safe lattice Boltzmann for high-performance computing on GPUs

by Andrea Montessori, Marco Lauricella, Adriano Tiribocchi, Mihir Durve, Michele La Rocca, Giorgio Amati, Fabio Bonaccorso, Sauro Succi

https://doi.org/10.1016/j.jocs.2023.102165

We present thread-safe, highly-optimized lattice Boltzmann implementations, specifically aimed at exploiting the high memory bandwidth of GPU-based architectures. At variance with standard approaches to LB coding, the proposed strategy, based on the reconstruction of the post-collision distribution via Hermite projection, enforces data locality and avoids the onset of memory dependencies, which may arise during the propagation step, with no need to resort to more complex streaming strategies. The thread-safe lattice Boltzmann achieves peak performances, both in two and three dimensions and it allows to reduce significantly the memory footprint (tens of GigaBytes for order billions lattice nodes simulations) by retaining the algorithmic simplicity of standard LB computing. Our findings open attractive prospects for high-performance simulations of complex flows on GPU-based architectures.

The Fortran codes can be found here: GitHub - andreamontessori/Openacc_LB

2 Likes

Large-scale earthquake sequence simulations on 3-D non-planar faults
using the boundary element method accelerated by lattice H-matrices

by So Ozawa, Akihiro Ida, Tetsuya Hoshino, and Ryosuke Ando
Geophysical Journal International, Volume 232, Issue 3, March 2023, Pages 1471–1481
Published: 06 October 2022

is associated with

The code is full of procedures starting with lines such as

 real*8 function HACApK_unrm_d(nd,za)
 implicit real*8(a-h,o-z)
 real*8 :: za(:)

Real*8 is non-standard, but all compilers accept it by default or with the appropriate options. However, journals should not accept papers until every reasonable effort has been made to ensure that the results are correct, so the authors should have been asked to replace the implicit real*8 statements with implicit none and needed declarations. Also function argument za should be intent(in).

2 Likes

I have seen a lot of Fortran codes from Japan (of course related to my area). Almost all have one thing in common i.e. to write a non standard code.

This is very common

implicit real*8 (a-h,o-z)

So far I did not find

implicit none

in any of the codes.

1 Like

It is likely that these codes date back to f77 or earlier. Real*8 declarations were a common compiler extension, so it was more portable to write codes that way than the standard real/double precision way. Implicit none was nonstandard, and many programmers used implicit typing as a way to write portable codes. This was a common convention, for example, in textbooks.

All these things are different now. Implicit none is a standard declaration, and the fortran kind system allows programmers to write portable code in a standard way.

I have two books. They both were published after 2010. One in Korean and the other in Japanese. Both have codes in non-standard way.

That is surprising because computational materials science deals with the programming language too, I think. Maybe people are not interested in learning what is new in the Fortran language and they keep finding solutions of the new problems with the 50 years old language standards. No wonder why it is famous as an old, obsolete language because in modern text books only f77 codes are given. Forget about do concurrent or parallel programming using co-arrays. And GPU codes are may be several decades away in the text books.

@RonShepard

This is a new project.

GitHub - ORNL/meumapps_ss: Phase field code for simulating microstructure evolution due to solid-state phase transformations in multi_component alloys

4 Likes

@shahmoradi The publication by Raj et al. contains a section reading like an advertisement (page 1131, left hand column) reporting (emphases added)

In short, implementing the coarray is easy, and the number
of lines of the program is also effectively reduced without any
drastic performance loss. Moreover, the coarray implementa-
tion does not need special syntax and call functions
. Hence,
the program is readable and easy to understand, maintain
or extend
. In the present work, the Opencoarrays library, an
open-source software project which produces an interface
used by the GNU Compiler Collection (GCC) Fortran front
end to build parallel executable programs, was used to imple-
ment CAF. The Opencoarrays library is easy to install and
is used in Linux and macOS. It is reliable since it has been
tested in several of the world’s fastest supercomputers and
various operating systems
[23–25].

before a comparison of performance with an implementation relying on either MPI, or coarrays:

4 Likes

A CUDA Fortran GPU-parallelised hydrodynamic tool for high-resolution and long-term eco-hydraulic modelling
by Marcos Sanz-Ramos, David López-Gómez, Ernest Bladé, and Danial Dehghan-Souraki
Environmental Modelling & Software
Volume 161, March 2023, 105628

2 Likes

GMD - Comparing the Performance of Julia on CPUs versus GPUs and Julia-MPI versus Fortran-MPI: a case study with MPAS-Ocean (Version 7.1) (copernicus.org)

2 Likes

Currently being discussed on Hacker News Comparing Performance of Julia on CPUs vs. GPUs and Julia-MPI vs. Fortran-MPI | Hacker News While this one deals specificly with MPI, I also saw there were a few older discussions about Julia vs. Fortran here including Simple summation 8x slower than in Julia

1 Like