Research articles using Fortran, 2025 on

The earlier thread became long, so I am starting a new one.

Dual Numbers for Arbitrary Order Automatic Differentiation

by F. Peñuñuri, K. B. Cantún-Avila, R. Peón-Escalante
arXiv 7 Jan 2025

Dual numbers are a well-known tool for computing derivatives of functions. While the theoretical framework for calculating derivatives of arbitrary order is well established, practical implementations remain less developed. One notable implementation is available in the Julia programming language where dual numbers are designed to be nested, enabling the computation of derivatives to arbitrary order. However, this approach has a significant drawback as it struggles with scalability for high-order derivatives. The nested structure quickly consumes memory, making it challenging to compute derivatives of higher orders. In this study, we introduce DNAOAD, a Fortran-based implementation of automatic differentiation capable of handling derivatives of arbitrary order using dual numbers. This implementation employs a direct approach to represent dual numbers without relying on recursive or nested structures. As a result, DNAOAD facilitates the efficient computation of derivatives of very high orders while addressing the memory limitations of existing methods.

12 Likes

Native Fortran Implementation of TensorFlow-Trained Deep and Bayesian Neural Networks
by Aidan Furlong, Xingang Zhao, Bob Salko, Xu Wu
arXiv 7 Feb 2025

Over the past decade, the investigation of machine learning (ML) within the field of nuclear engineering has grown significantly. With many approaches reaching maturity, the next phase of investigation will determine the feasibility and usefulness of ML model implementation in a production setting. Several of the codes used for reactor design and assessment are primarily written in the Fortran language, which is not immediately compatible with TensorFlow-trained ML models. This study presents a framework for implementing deep neural networks (DNNs) and Bayesian neural networks (BNNs) in Fortran, allowing for native execution without TensorFlow’s C API, Python runtime, or ONNX conversion. Designed for ease of use and computational efficiency, the framework can be implemented in any Fortran code, supporting iterative solvers and UQ via ensembles or BNNs. Verification was performed using a two-input, one-output test case composed of a noisy sinusoid to compare Fortran-based predictions to those from TensorFlow. The DNN predictions showed negligible differences and achieved a 19.6x speedup, whereas the BNN predictions exhibited minor disagreement, plausibly due to differences in random number generation. An 8.0x speedup was noted for BNN inference. The approach was then further verified on a nuclear-relevant problem predicting critical heat flux (CHF), which demonstrated similar behavior along with significant computational gains. Discussion regarding the framework’s successful integration into the CTF thermal-hydraulics code is also included, outlining its practical usefulness. Overall, this framework was shown to be effective at implementing both DNN and BNN model inference within Fortran, allowing for the continued study of ML-based methods in real-world nuclear applications.

Related projects:

5 Likes

Teaching An Old Dog New Tricks: Porting Legacy Code to Heterogeneous Compute Architectures With Automated Code Translation
by Nicolas Nytko, Andrew Reisner, J. David Moulton, Luke N. Olson, and Matthew West
arXiv 7 Feb 2025

Legacy codes are in ubiquitous use in scientific simulations; they are well-tested and there is significant time investment in their use. However, one challenge is the adoption of new, sometimes incompatible computing paradigms, such as GPU hardware. In this paper, we explore using automated code translation to enable execution of legacy multigrid solver code on GPUs without significant time investment and while avoiding intrusive changes to the codebase. We developed a thin, reusable translation layer that parses Fortran 2003 at compile time, interfacing with the existing library Loopy to transpile to C++/GPU code, which is then managed by a custom MPI runtime system that we created. With this low-effort approach, we are able to achieve a payoff of an approximately 2-3x speedup over a full CPU socket, and 6x in multi-node settings.

The original Fortran/C++/C code is

and a tool used is

1 Like

Seamless acceleration of Fortran intrinsics via AMD AI engines
by Nick Brown and Gabriel Rodríguez Canal
arXiv 14 Feb 2025

A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations. Specialised architectures, such as FPGAs and AMD’s AI Engines (AIEs), have been demonstrated to provide significant energy efficiency advantages, however a major challenge is that to most effectively program these architectures requires significant expertise and investment of time which is a major blocker.
Fortran in the lingua franca of scientific computing, and in this paper we explore automatically accelerating Fortran intrinsics via the AIEs in AMD’s Ryzen AI CPU. Leveraging the open source Flang compiler and MLIR ecosystem, we describe an approach that lowers the MLIR linear algebra dialect to AMD’s AIE dialects, and demonstrate that for suitable workloads the AIEs can provide significant performance advantages over the CPU without any code modifications required by the programmer.

Looking at Table 4, large speedups using a Neural Processing Unit (NPU) instead of a CPU are possible for matmul for the int32, bfloat16, and float32 types.

2 Likes

A parallel texture-based region-growing algorithm implemented in OpenMP

1 Like