Research articles using Fortran, 2025 on

Beliavsky · January 14, 2025, 2:08am

The earlier thread became long, so I am starting a new one.

Dual Numbers for Arbitrary Order Automatic Differentiation

by F. Peñuñuri, K. B. Cantún-Avila, R. Peón-Escalante
arXiv 7 Jan 2025

Dual numbers are a well-known tool for computing derivatives of functions. While the theoretical framework for calculating derivatives of arbitrary order is well established, practical implementations remain less developed. One notable implementation is available in the Julia programming language where dual numbers are designed to be nested, enabling the computation of derivatives to arbitrary order. However, this approach has a significant drawback as it struggles with scalability for high-order derivatives. The nested structure quickly consumes memory, making it challenging to compute derivatives of higher orders. In this study, we introduce DNAOAD, a Fortran-based implementation of automatic differentiation capable of handling derivatives of arbitrary order using dual numbers. This implementation employs a direct approach to represent dual numbers without relying on recursive or nested structures. As a result, DNAOAD facilitates the efficient computation of derivatives of very high orders while addressing the memory limitations of existing methods.

Beliavsky · February 14, 2025, 11:13pm

Native Fortran Implementation of TensorFlow-Trained Deep and Bayesian Neural Networks
by Aidan Furlong, Xingang Zhao, Bob Salko, Xu Wu
arXiv 7 Feb 2025

Over the past decade, the investigation of machine learning (ML) within the field of nuclear engineering has grown significantly. With many approaches reaching maturity, the next phase of investigation will determine the feasibility and usefulness of ML model implementation in a production setting. Several of the codes used for reactor design and assessment are primarily written in the Fortran language, which is not immediately compatible with TensorFlow-trained ML models. This study presents a framework for implementing deep neural networks (DNNs) and Bayesian neural networks (BNNs) in Fortran, allowing for native execution without TensorFlow’s C API, Python runtime, or ONNX conversion. Designed for ease of use and computational efficiency, the framework can be implemented in any Fortran code, supporting iterative solvers and UQ via ensembles or BNNs. Verification was performed using a two-input, one-output test case composed of a noisy sinusoid to compare Fortran-based predictions to those from TensorFlow. The DNN predictions showed negligible differences and achieved a 19.6x speedup, whereas the BNN predictions exhibited minor disagreement, plausibly due to differences in random number generation. An 8.0x speedup was noted for BNN inference. The approach was then further verified on a nuclear-relevant problem predicting critical heat flux (CHF), which demonstrated similar behavior along with significant computational gains. Discussion regarding the framework’s successful integration into the CTF thermal-hydraulics code is also included, outlining its practical usefulness. Overall, this framework was shown to be effective at implementing both DNN and BNN model inference within Fortran, allowing for the continued study of ML-based methods in real-world nuclear applications.

Related projects:

Beliavsky · February 15, 2025, 12:52am

Teaching An Old Dog New Tricks: Porting Legacy Code to Heterogeneous Compute Architectures With Automated Code Translation
by Nicolas Nytko, Andrew Reisner, J. David Moulton, Luke N. Olson, and Matthew West
arXiv 7 Feb 2025

Legacy codes are in ubiquitous use in scientific simulations; they are well-tested and there is significant time investment in their use. However, one challenge is the adoption of new, sometimes incompatible computing paradigms, such as GPU hardware. In this paper, we explore using automated code translation to enable execution of legacy multigrid solver code on GPUs without significant time investment and while avoiding intrusive changes to the codebase. We developed a thin, reusable translation layer that parses Fortran 2003 at compile time, interfacing with the existing library Loopy to transpile to C++/GPU code, which is then managed by a custom MPI runtime system that we created. With this low-effort approach, we are able to achieve a payoff of an approximately 2-3x speedup over a full CPU socket, and 6x in multi-node settings.

The original Fortran/C++/C code is

and a tool used is

Beliavsky · February 17, 2025, 9:09pm

Seamless acceleration of Fortran intrinsics via AMD AI engines
by Nick Brown and Gabriel Rodríguez Canal
arXiv 14 Feb 2025

A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations. Specialised architectures, such as FPGAs and AMD’s AI Engines (AIEs), have been demonstrated to provide significant energy efficiency advantages, however a major challenge is that to most effectively program these architectures requires significant expertise and investment of time which is a major blocker.
Fortran in the lingua franca of scientific computing, and in this paper we explore automatically accelerating Fortran intrinsics via the AIEs in AMD’s Ryzen AI CPU. Leveraging the open source Flang compiler and MLIR ecosystem, we describe an approach that lowers the MLIR linear algebra dialect to AMD’s AIE dialects, and demonstrate that for suitable workloads the AIEs can provide significant performance advantages over the CPU without any code modifications required by the programmer.

Looking at Table 4, large speedups using a Neural Processing Unit (NPU) instead of a CPU are possible for matmul for the int32, bfloat16, and float32 types.

shahmoradi · February 18, 2025, 5:55pm

A parallel texture-based region-growing algorithm implemented in OpenMP

Shahid · February 25, 2025, 3:08pm

High performance additive manufacturing phase field simulation: Fortran Do Concurrent vs OpenMP - ScienceDirect

Standard language parallelism is an alternate way to achieve the parallel performance of the code without using external application processing interface (API). In this work, we present the Fortran Do Concurrent standard language parallel feature for additive manufacturing. We developed an open-source AMSimulator application and have implemented OpenMP and Fortran Do Concurrent in the phase field simulation. Performance has been measured across various platforms like Windows 10 and Linux and open-source compilers with Intel and NVIDIA. We found that using standard language parallel features, the same performance can be achieved without the need of external API. This high-performance approach is useful for code development and portability across various platforms.

Related GitHub
GitHub - Shahid718/AMSimulator: The repository compares Do Concurrent with OpenMP’s performance for additive manufacturing using the phase field approach. The app is developed with the Fortran programming language.

Beliavsky · March 3, 2025, 3:00pm

Accelerating the Dutch Atmospheric Large-Eddy Simulation (DALES) model with OpenACC
by Lucas Esclapez, Laurent Soucasse, Caspar Jungbacker, Fredrik Jansson,
Stephan R. de Roode, Pedro Costa @pcosta, Gijs van den Oord, and Alessio Sclocco
arXiv 21 Feb 2025

This paper presents the GPU porting through OpenACC directives of the
Dutch Atmospheric Large-Eddy Simulation (DALES) application, a
high-resolution atmospheric model. The code is written in Fortran~90
and features parallel (distributed) execution through spatial domain
decomposition. We assess the performance of the GPU offloading,
comparing the time-to-solution on regular and accelerated HPC nodes.
%comparing the computational time between distributed and accelerated
nodes. A weak scaling analysis is conducted and portability across
NVIDIA A100 and H100 hardware %and AMD hardware is discussed. Finally,
we show how targeted kernels can benefit from further optimization
with Kernel Tuner, a GPU kernels auto-tuning package.

DALES is on GitHub.

Beliavsky · April 1, 2025, 5:44pm

Caffeine: A parallel runtime library for supporting modern Fortran compilers
by Dan Bonachea, Katherine Rasmussen, Brad Richardson, and Damian Rouson
Journal of Open Source Software, 29 March 2025

The Coarray Fortran domain-specific language pioneered a parallel programming approach
designed as a syntactically small extension to Fortran 95 (Numrich & Reid, 1998). Fortran
2008 incorporated Coarray Fortran features, including multi-image execution, synchronization
statements, coarrays, and more. Fortran 2018 greatly expanded this feature set to include
such concepts as teams (groupings) of images, events (counting semaphores), collective
subroutines and failed-image detection (fault tolerance). Fortran 2023 provided additional,
minor multi-image extensions, including notified remote data access (Fortran Standards
Committee JTC1/SC22/WG5, Nov 2023).
Caffeine’s initial target compilers include LLVM flang and LFortran, both of which have no
existing multi-image parallel runtime and thus will need one to reach full compliance with the
2008, 2018, or 2023 versions of the Fortran standard. The Caffeine project team has submitted > the PRIF specification as a pull request on the llvm-project git repository and through private correspondence have confirmed the lead LFortran developer’s interest in adopting PRIF when LFortran begins work on enabling multi-image execution.

Beliavsky · April 1, 2025, 11:45pm

Pyrometheus: Symbolic abstractions for XPU and automatically differentiated computation of combustion kinetics and thermodynamics
by Esteban Cisneros-Garibay, Henry Le Berre, Spencer H. Bryngelson, and Jonathan B. Freund
arXiv 31 Mar 2025

The software package Pyrometheus is introduced as an implementation of these abstractions and their transformations for combustion thermochemistry. The formulation facilitates code generation from the symbolic representation of a specific thermochemical mechanism in multiple target languages, including Python, C++, and Fortran.

ivanpribec · April 27, 2025, 4:17pm

Trim, S. J., & Spiteri, R. J. (2024). Algorithm 1054: ellipFor, a Fortran software library for Legendre elliptic integrals and Jacobi elliptic functions with generalized input arguments. ACM Transactions on Mathematical Software . https://doi.org/10.1145/3709136

Legendre elliptic integrals and Jacobi elliptic functions arise in multiple applications within the physical sciences, including oscillations, celestial mechanics, and geodynamics. In this study, we describe the Fortran library ellipFor capable of evaluating the following for generalized input values: (1) the complete Legendre elliptic integrals of the first and second kinds, (2) the incomplete Legendre elliptic integrals of the first and second kinds, and (3) the principal Jacobi elliptic functions. Our software builds upon previously developed Fortran routines, which were designed with restrictions on input parameters that may be limiting in applications. Our routines apply multiple transformations to allow for more general input values, such as elliptic moduli greater than unity for points 1–3, arbitrary real Jacobi amplitudes for points 1–2, and complex first arguments for point 3. In addition, our routines are thread-safe, allowing for parallel computations. Our routines were compared with values from the computer algebra system SageMath over a wide range of input parameters. Values from ellipFor and SageMath agreed to within tolerances commensurate with the limitations of floating-point arithmetic used for the elliptic integrals and Jacobi elliptic functions listed in points 1, 2, and 3 above for generalized input arguments.

Beliavsky · April 27, 2025, 7:29pm

An associated repo is

rgba · June 3, 2025, 11:30pm

Today we have finally announced on the arxiv the updated version of EDIpack, a Lanczos based solver for quantum impurity models, at the earth of dynamical mean field theory and other quantum embedding descriptions of strongly correlated materials.

Put all the scientific of the new features aside, one significant improvement has been to fully leverage on Fortran-C interoperability to make the library callable from C++, Python and Julia, with a unique reliable strategy.

So far, we focused on explicit and ready-to-go interfaces with two major DMFT community packages, namely the TRIQS framework and the w2dynamics code, as they are expected to provide the most obvious flow of potentially new users of our library.

Yet, I have plans on exploring in depth the possibilities of full interoperability with Julia, as DMFT, besides an impurity solver, relies crucially on good optimization routines (the ‘fit of the bath’ step in the self-constistent algorithm). So leveraging on established, state of the art, optimization libraries in the Julia ecosystem could really make for unprecedented improvements in out workflow. Not to speak about autodifferentiation capabilities or other unforseen interactions with their growing and very active ecosystem.

I’m glad we finally made this big step forward!

shahmoradi · June 9, 2025, 3:35pm

Beliavsky · June 11, 2025, 4:42pm

Automatically parallelizing batch inference on deep neural networks using Fiats and Fortran 2023 do concurrent (2025)
by
Rouson, Damian
Bai, Zhe
Bonachea, Dan
Ergawy, Kareem
Gutmann, Ethan
Klemm, Michael
Rasmussen, Katherine
Richardson, Brad
Shende, Sameer
Torres, David
Zhang, Yunhao

Lawrence Berkeley Lab Publications

Abstract
This paper introduces novel programming strategies that leverage features of the Fortran 2023 standard of the International Standards Organization (ISO) to automatically parallelize computations on deep neural networks. The paper focuses on the interplay of object-oriented, parallel, and functional programming paradigms in the Fiats deep learning library. We demonstrate how several infrequently used language features play a role in enabling efficient, parallel execution. Specifically, the ability to explicitly declare that a procedure is pure facilitates inference in the context of the language’s loop-parallelism construct do concurrent. Also, explicitly prohibiting the overriding of a parent type’s type-bound procedures eliminates the need for dynamic dispatch in performance-critical code. Finally, this paper uses batch inference calculations on a neural network surrogate for atmospheric aerosol dynamics to demonstrate that LLVM Flang compiler’s automatic parallelization of do concurrent achieves roughly the same performance and scalability as achieved by OpenMP compiler directives. We also demonstrate that double-precision inference costs 37–72% longer runtime than default-real precision with most values in the range 57-60%.

Beliavsky · June 11, 2025, 5:16pm

A near-real-time data-assimilative model of the solar corona
by Cooper Downs, Jon A. Linker, Ronald M. Caplan, Emily I. Mason, Pete Riley, Ryder Davidson, Andres Reyes, Erika Palmerio, Roberto Lionello, James Turtle, Michal Ben-Nun, Miko M. Stulajter, Viacheslav S. Titov, Tibor Török, Lisa A. Upton, Raphael Attie, Bibhuti K. Jha, Charles N. Arge, Carl J. Henney, Gherardo Valori, Hanna Strecker, Daniele Calchetti, Dietmar Germerott, Johann Hirzberger, David Orozco Suárez, Julian Blanco Rodríguez, Sami K. Solanki, Xin Cheng, Sizhe Wu

Science
10 Jun 2025

Abstract
The Sun’s corona is its tenuous outer atmosphere of hot plasma, which is difficult to observe. Most models of the corona extrapolate its magnetic field from that measured on the photosphere (the Sun’s optical surface) over a full 27-day solar rotational period, providing a time-stationary approximation. We present a model of the corona that evolves continuously in time, by assimilating photospheric magnetic field observations as they become available. This approach reproduces dynamical features that do not appear in time-stationary models. We used the model to predict coronal structure during the total solar eclipse of 8 April 2024 near the maximum of the solar activity cycle. There is better agreement between the model predictions and eclipse observations in coronal regions located above recently assimilated photospheric data.

Code is at Zenodo.

Image from X/Twitter:

shahmoradi · July 9, 2025, 5:45pm

urbanjost · July 9, 2025, 7:33pm

Modern algorithms expressed in modern parallel object-oriented Fortran with a clear user interface, incorporating fpm packages and builds with fpm and leverages other modern open-source Fortran packages. What is not to like?

jacobwilliams · July 22, 2025, 12:50am

Uses bspline-fortran library.

Topic		Replies	Views
Automatic differentiation of Fortran code, opinions? Help	39	6648	June 28, 2024
Fortran and Neural Networks	17	7110	November 14, 2021
Automatic Differentiation Built Into LFortran Language enhancement	27	2227	April 9, 2024
Research articles using Fortran	178	17004	March 31, 2025
Global Ocean Modeling With GPU Acceleration in Python	40	2126	January 11, 2022

Related topics