If anyone wants to dig into the details of what the LANL researchers did:
Whatever results the second paper presents for automatic Fortran to C++ translation underestimate what can be achieved, since the authors do not take the obvious step of using LLMs to fix code that does not compile. I added the bolding of text below.
Compilation accuracy of the translated C++ measures how many translations successfully compile without errors (Wen et al., 2022b). We compiled each translated C++ using the g++ v5.3.0 compiler on Red Hat Enterprise Linux Workstation release 7.9. If a C++ translation failed to compile, we recorded the compiler output and did not proceed further with that translation (Figure 1). We reviewed the compiler output and categorized each
error as shown in Table 2.
I have a C++ agent to fix C++ compilation errors, and Iām sure there are much more powerful tools for this.
I see several issues with translating Fortran to C++. The most fundamental is that Fortran has concepts and capabilities that C++ lacks (of course, the reverse is true too). A significant example is single-program, multiple-data (SPMD) parallelism with a partitioned global address space (PGAS) that both work in distributed memory. The closest analogous C++ concept might be multithreaded programming, but that only works in shared memory and one would need to fork all threads at the beginning of execution, handle several setup tasks (e.g., establishing non-allocatable coarrays), not join the threads until the end of execution, and prevent the spawning of additional threads by individual loops ā and thatās just a small sampling of the issues that would need to be addressed. One could translate the SPMD and PGAS features to one-sided MPI, but thatās going to be challenging to get right, less readable, and is likely to be hurt performance.
Then thereās a lot of information loss involved. Think about the long list of constraints that apply to pure
procedures. Unless thereās a similar C++ concept, the reader of the translated code will have to read through each translated pure
procedure to rediscover all the information that the single keyword pure
provides in one fell swoop. Such rediscovery becomes especially important if the procedure gets called inside a parallel loop when translated to C++. By contrast, every procedure called inside Fortranās do concurrent
construct must be pure
according to the Fortran standard.
Moreover, even though C++ now has multidimensional arrays, C++ still lacks array statements as far as I know. So are all array statements being converted to nested loops? If so, there again is a loss of information unless those are C++ parallel_for
loops in order to retain the information that there is no implied ordering of iterations. Even then, itās likely to lead to code bloat wherein what was one line in Fortran could become many more lines in C++.
And the languages are only continuing to diverge. Fortran 2028 templates will be type-safe ā something that is not easily expressible in C++ because C++ doesnāt allow for specifying template requirements (relationships between types, procedures, and combinations thereof) so the loss of information could also lead to a loss in type safety if Fortran programmers take full advantage of the upcoming template feature.
Iāve only scratched the surface above. How about the fact that C++ allows overloading operators but does not facilitate user definitions of new operators. A common response is that user-defined operators are syntactic sugar, but such statements ignore the additional semantic constraints involved such as the requirement that the operands have the intent(in)
property. Thatās information the reader immediately knows when seeing the use of a user-defined operator in Fortran, whereas one would have to inspect the signature of every C++ function that replaces a Fortran operator to discover this same information about the arguments. And then thereās the argument that syntactic sugar can be exceptionally powerful in its communicative value.
Thereās so much more that can be said about such topics as the differences between Fortran pointers and C++ pointers, e.g., target
communicates important information to both the reader and the compiler. How will this information be communicated to the C++ compiler or developer?
Bottom line: the two languages are equivalent only in a superficial way that ignores a lot and accepts a considerable about of information loss, extreme restrictions, code bloat, and potential loss of safety and performance.
I struggle to comprehend how LLMs, trained on background that is likely not relevant to the context at hand, are superior to using inductive logic programming for specification recovery.
This was an ancient line of CS research when logic and rigor were considered important (they donāt seem to be very important today, given the claims people with tech sophistication accept).
Clearly, specification recovery is less precise in the sense of being able to verify a program meets its spec, than a program derived from a spec from first principles, as it is only an approximation, based on observable behavior of the program based on finite input. But I canāt see how anyone can put more confidence (in the sense that frequentist statisticians use it) in any program transform derived from an LLM.
I would say that they are diametrically opposed. Fortran was always about computation, C was always about hardware control. In Fortran, the states that the machine goes through between I/O are unobservable. In C, they are of supreme importance because observable hardware is being changed.
On a cheerier note, the Lawrence Berkeley National Lab
recognized Computing Sciencesā Damian Rouson @rouson as Developer of the Year. Damian, who is in the Applied Mathematics and Computational Research Division, led the development of new software tools for testing and correctness checking and a library that supports Fortran/C/C++ language interoperability.
Some relevant projects are
Following on from what Themos said, I find the following publications worth a read for some of the history and development of C and C++.
ACM SIG PLAN Notices
Volume 28 Number 3 March 1993
History of Programming Langauges Conference
HOPL II
C Dennis M. Ritchie
The Development of the C Language
201-208
C++ Bjarne Stroustrup
A History of C++: 1979-1991
Bjarne Stroustrup
The Design and Evolution of C++
Addislan Wesley
ISBN 0201543303
March 2007
Ian Chivers
I found this working paper recently, which provides an analysis of the shortcomings of C++ array libraries compared to Fortran:
Abstract:
As a language for scientific computing, C++ is at a disadvantage compared to many other languages due to its lack of a well-designed standard for multi-dimensional arrays supporting efficient whole-array expressions, expressive array-subsetting syntax and linear algebra. To support the development of such a standard, in this paper I review the interface, capabilities and weaknesses of a number of free C++ array libraries (Adept, Armadillo, Blaze, Blitz++, Eigen, MTL4, ra-ra, uBLAS and Xtensor) as well as other languages supporting multi-dimensional arrays (particularly Fortran, Python, Matlab, IDL and Julia). These are contrasted with the verbose and limited whole-array capabilities in the C++20 Standard Template Library. To help ensure the standard meets the needs of large-scale scientific applications, I also present an analysis of array use in an Earth-system model for operational weather forecasting (2.2 million lines of code). I argue that an unlimited number of dimensions should be supported, not the limit of two imposed by many libraries focusing on linear algebra, and propose a solution to the lack of a matrix-multiplication operator in C++. A detailed investigation is presented of the problem that most C++ libraries cannot simply and efficiently pass a subset of an array to a function, and a solution is proposed. A total of 25 specific recommendations are made that will hopefully contribute to a discussion leading to the formulation of a standard.
LLMs can help people understand large code bases and change them. There is a project Search-Engine-Integrated Multi-Expert Inference (SEIMEI) that claims to āoptimize reasoning steps (with agents) and achieve SOTA results on tasks requiring deep reasoningā. The documentation on p10 briefly describes the use of an AI Agent on Nuclear Fusion Simulation, which can for example convert the coordinates of a simulation and explain what happens when running the GyroKinetic Vlasov simulation code.
LLMs can do a lot, but they are also hyped. I have not used SEIMEI. Maybe the author would be willing to apply the tool to other large codes. The code is provided, and it can be run on a local gpu or rental server gpu.
I think there are a few things that would dramatically lower the barrier to entry for new scientific programmers. In no particular order:
- Interactive fortran environments (a la Jupyter) to quickly start workshopping code. This is really nice because it can dramatically simplify the workflow and reduces the amount of knowledge needed to write and execute code. It could also reduce the burden of installing various software tools to build a project and makes the iteration of develop ā test more visually intuitive.
- Better IO. This is a place where Fortran is really lacking in my opinion. I understand that there is a lot of history here, but reading a data file into an array takes a lot of lines of code and error messages are confusing or unhelpful. New scientists often have to load data and then manipulate it in some way and fortran doesnāt have great utilities for either. Utilities like pandas.read_csv or numpy.loadtxt or h5py or xarray are all extremely useful and move the burden from loading data to analyzing it.
- Expansion of stdlib array operations. I know that this is in progress and is limited by people power. Python (and other languages) have so many utilities for altering and manipulating data. Common operations like rolling averages/convolutions, histogramming, ffts, curve fitting, interpolation, etc. would benefit the community and make Fortran more attractive to newer programmers.
@certik The Python Fortran Rosetta Stone is very helpful! I have looked at it many times especially when I was first learning. I have had aspirations to help improve it but keep getting dragged in to other projects.
Text format
use stdlib_io, only: loadtxt
implicit none
real, allocatable :: x(:, :)
call loadtxt('example.csv', x, delimiter=',')
NumPy binary array format
use stdlib_io_npy, only: load_npy
implicit none
real, allocatable :: x(:, :)
call load_npy('example.npy', x)
stdlib_io: io ā Fortran-lang/stdlib
use h5fortran
call h5write('my.h5', '/x', x)
call h5read('my.h5', '/y', y)
h5fortran: GitHub - geospace-code/h5fortran: Lightweight HDF5 polymorphic Fortran: h5write() h5read()
Compatible with netCDF as far as I can understand. A good tutorial can be found here: NetCDF | Programming in Modern Fortran.
I copied the example shown there into a file, opened up a shell on my MacBook and ran the commands:
> brew install netcdf-fortran
...
> export NETCDF_ROOT=`brew --prefix netcdf-fortran`
> gfortran -I$NETCDF_ROOT/include -L$NETCDF_ROOT/lib example.f90 -lnetcdff
> ./a.out
Data to be written to NetCDF file:
----------------------------------------------------------------
0 1 2 3 4 5 6 7 8 9 10 11
6 7 8 9 10 11 12 13 14 15 16 17
12 13 14 15 16 17 18 19 20 21 22 23
18 19 20 21 22 23 24 25 26 27 28 29
24 25 26 27 28 29 30 31 32 33 34 35
30 31 32 33 34 35 36 37 38 39 40 41
----------------------------------------------------------------
Data read from NetCDF file:
----------------------------------------------------------------
0 1 2 3 4 5 6 7 8 9 10 11
6 7 8 9 10 11 12 13 14 15 16 17
12 13 14 15 16 17 18 19 20 21 22 23
18 19 20 21 22 23 24 25 26 27 28 29
24 25 26 27 28 29 30 31 32 33 34 35
30 31 32 33 34 35 36 37 38 39 40 41
----------------------------------------------------------------
So I think the basic needs are there.
IMO, Fortran does pretty well on structured and binary data. I see bigger issues with text formats like JSON, XML, TOML, YAML. There are Fortran libraries for each those, but the interfaces are not very consistent and usage tends to be very verbose. Most of them are volunteer projects maintained by a single person.
What Fortran projects absolutely lack is good documentation.
Interactive fortran environments (a la Jupyter)
Such as LFortran + Jupyter-Lab?
⦠but thereās some way to go to make it suitable for classrooms and plotting in this environment is key.
stdlib
already includes some procedures like loadtext
. Ongoing development of stdlib on the io end will help cross more io-related barriers. NetCDF and HDF are fairly easy to handle with Fortran and fairly well documented. The main barrier I can see here is for people to get together all their tools with minimal hassle.
Agree. Expanding the stats modules would be a priority for the sort of things I teach.
So ⦠weāre not quite there yet, but well on the way I think.
Maybe worthwhile to write a proposal for one of these? NumPy, xarray and related formats have strong ties with the NumFOCUS world, so I think it would be mutually beneficial.
Note that Fortran-Lang is member of NumFOCUS: Fortran-lang - NumFOCUS
Documentation that can be part of the code is supported in many languages but Fortran lacks even a block text capability. I use preprocessors to make up for that lack.
One useful mode is to allow code to be contained in markdown just like it is in discourse. ```fortran starts the code section. This allows placing documentation, links to external resources, and C code along with the Fortran code. The file is both the source and a github-compatible document.
The other mode is to allow for a free-format block of text in the input that the preprocessor can turn into comments and/or write to a file for further post-processing.
I used to put the code into html documents in an <XMP> </XMP> section but technically that is deprecated.
But I think Fortran desperately needs some method of block text that adds similiar functionality.
Even using cpp #if to skip over text is better than nothing but if a standard preprocessor is ever agreed upon it could be used to add similiar capability.
I find it far more maintainable if the documentation is right in the code.
To leverage the code itself as part of the available information Doxygen and Ford provide another approach.
fpm(1) would benefit dramatically from a standard form of documentation; particularly if the fpm command itself could locate and display it.
I think the most natural approach is if fpm could compile .md files instead of just .f/.f90/.c files myself; with a tool that could display and search the files in a CLI environment or convert it to HTML (which is what the original markdown perl script did) would be an appealing approach that could be done now without waiting for a change to the standard.
Interactive Fortran might make IMPLICIT statements be in vogue again. LFortran will take that to a new level but a small interpreter that allowed use of stdlib functions and included help text would be useful. There used to be several F77 Fortran interpreters but I cannot find any now.
M_matrix is not quite suitable, but it can be used to explore the concept. You can call it stand-alone or as a procedure from your code and it provides for a minimal embedded-language environment. It lets you explore two fpm packages ā M_sets and M_orderpack, including built-in help for the procedures.
Perhaps a similar program that supported minimal Fortran interpretation and provided searchable help for the stdlib procedures and included stdlib procedures as built-in functions would help promote stdlib usage. gnuplot support similar to what @Beliavsky added to his expression parser would be a nice bonus.
M_matrix shows a model for what an interpreter/demonstrator/documentation tool might look like;
prep is a preprocessor that can read from a markdown file or convert text blocks to Fortran code or comments or extract into a file, and fman shows how a terminal-based markdown viewer might look like, although only a small subset of Discourse markdown might be possible (images, formulas, multimedia ⦠would be hard to display just using a terminal emulator) ā¦
PS:
single-file versions of lala, prep, and fman are at
mars/bootstrap at main Ā· lockstockandbarrel/mars Ā· GitHub; which would be an easier way for non-fpm users in particular to use to explore tools similiar to the proposed tool.
That is one of the ways that I solve this problem too.
#if 0
...arbitrary lines of text...
#endif
However, a downside of this approach is that modifying that text block can trigger a sequence of unnecessary recompilations, or, if the file is a low-level file, say in a library, it can trigger several unnecessary entire program rebuilds.
Awesome, thank you, I am glad it was useful. It was useful even for me to realize that anything numerical you can do in Python/NumPy, you can do in Fortran, just as easily.
Yes, weāll add plotting.
Requirements for modern software: HPC software relies on the broader software landscape (e.g., programming languages, vendor tools, runtimes). Hence, the HPC stack must evolve accordingly and alongside the broader software ecosystem to meet the post-Moore eraās requirementsāincluding trustworthiness (e.g., memory-safety, robustness) [20], reproducibility, maintainability, and energy-efficiencyāand to align with national interests. Not meeting these requirements puts the field at risk in mission-critical scenarios, and the clearest example of vulnerability is the overwhelming reliance on Fortran [37,25] in legacy HPC codes.
headscratch
Iām confused. They list criteria ~easily met by Fortran (even old Fortran) and then go on to say reliance on Fortran is problem. Or are they specifically talking about the problem of a lack of expertise in f77? Regardless, this seems poorly phrased at best.
it is just that the people that wrote this article have a misconception of how Fortran works, both legacy and modern one.
As we say in french, āWhen you want to kill your dog, you accuse it of having rabiesā. In the present case itās also: āWhen you want to shine with your brand new rifle, you look for a dog to killā.