Fully integrating the Flang Fortran compiler with standard MLIR (preprint)

Beliavsky · October 3, 2024, 12:31pm

Fully integrating the Flang Fortran compiler with standard MLIR
by Nick Brown
arXiv 27 Sep 2024

Abstract: Fortran is the lingua franca of HPC code development and as such it is crucial that we as a community have open source Fortran compilers capable of generating high performance executables. Flang is LLVM’s Fortran compiler and leverages MLIR which is a reusable compiler infrastructure which, as part of LLVM, has become popular in recent years.
However, whilst Flang leverages MLIR it does not fully integrate with it and instead provides bespoke translation and optimisation passes to target LLVM-IR. In this paper we first explore the performance of Flang against other compilers popular in HPC for a range of benchmarks before describing a mapping between Fortran and standard MLIR, exploring the performance of this. The result of this work is an up to three times speed up compared with Flang’s existing approach across the benchmarks and experiments run, demonstrating that the Flang community should seriously consider leveraging standard MLIR.

The design of Flang is somewhat surprising in that whilst it defines its own MLIR dialects, it sits apart from the rest of standard MLIR with the exception of leveraging a number of standard dialects. Consequently, that Flang implements its own optimisations and lowerings, sitting apart from the work being developed by the community in MLIR. Not only does this increase duplication, but furthermore has the potential to impact performance because Flang is unable to take advantage of the progress being made in MLIR, much of which is driven by hardware vendors.

In this paper we explore an alternative approach, where Flang’s MLIR dialects are lowered to the standard MLIR dialects, then relying on existing MLIR transformations and optimisations to build binaries. We have developed a research prototype that enables us to test the hypothesis that such an approach can deliver improved performance and help close the performance gap between Flang and other, more mature, Fortran compilers.

This paper is structured as follows; after exploring the background to this work in Section II, we then describe the setup used for experiments throughout this paper in Section III, before undertaking a performance comparison of binaries produced by Flang against the Cray and Gfortran compilers in Section IV. Section V describes our mapping between Fortran concepts in Flang’s dialects and the standard MLIR dialects, before we explore the performance that this delivers in Section VI. Lastly, conclusions are drawn in Section VII which also
discusses further work.

Another recent paper about Flang is

Automatic Parallelization and OpenMP Offloading of Fortran Array Notation
by Ivan R. Ivanov, Jens Domke, Toshio Endo & Johannes Doerfert
16 September 2024

Abstract: The Fortran programming language is prevalent in the scientific computing community with a wealth of existing software written in it. It is still being developed with the latest standard released in 2023. However, due to its long history, many old code bases are in need of modernization for new HPC systems. One advantage Fortran has over C and C++, which are other languages broadly used in scientific computing, is the easy syntax for manipulating entire arrays or subarrays. However, this feature is underused as there was no way of offloading them to accelerators and support for parallelization has been unsatisfactory. The new OpenMP 6.0 standard introduces the workdistribute directive which enables parallelization and/or offloading automatically by just annotating the region the programmer wishes to speed up. We implement workdistribute in the LLVM project’s Fortran compiler, called Flang. Flang uses MLIR – Multi-Level Intermediate Representation – which allows for a structured representation that captures the high level semantics of array manipulation and OpenMP. This allows us to build an implementation that performs on par with more verbose manually parallelized OpenMP code. By offloading linear algebra operations to vendor libraries, we also enable software developers to easily unlock the full potential of their hardware without needing to write verbose, vendor-specific source code.

published in the book Advancing OpenMP for Future Accelerators: 20th International Workshop on OpenMP, IWOMP 2024, Perth, WA, Australia, September 23–25, 2024, Proceedings

ivanpribec · October 4, 2024, 9:55am

Discussion on the workdistribute construct at the LLVM Forum: [RFC] OpenMP workdistribute construct implementation in Flang - Flang - LLVM Discussion Forums

This feature is kind of similar to what the Nvidia compiler supports (https://developer.nvidia.com/blog/bringing-tensor-cores-to-standard-fortran/), just more prescriptive in the sense you can control which statements to parallelize from within the source code using directives, and not just externally via compilation flags and control of compilation units.

Btw, the work of Ivanov, Domke, Endo & Doerfert is not the only work in the book of the conference proceedings (see original post) that deals with Fortran. The conference page can be found here: https://www.iwomp.org/

The Pawsey Supercomputing Research Center where the conference was hosted also published a video:

Topic		Replies	Views
Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs Announcements	5	823	August 31, 2023
LLVM's Fortran Compiler "Flang" Progress NVIDIA	3	771	August 28, 2024
Support for Flang effort Help	21	1384	March 23, 2023
State of LLVM Flang Development NVIDIA	23	2004	September 26, 2024
Flang compilers Help	7	394	January 13, 2025

Fully integrating the Flang Fortran compiler with standard MLIR (preprint)

Related topics