If anyone wants to dig into the details of what the LANL researchers did:
Whatever results the second paper presents for automatic Fortran to C++ translation underestimate what can be achieved, since the authors do not take the obvious step of using LLMs to fix code that does not compile. I added the bolding of text below.
Compilation accuracy of the translated C++ measures how many translations successfully compile without errors (Wen et al., 2022b). We compiled each translated C++ using the g++ v5.3.0 compiler on Red Hat Enterprise Linux Workstation release 7.9. If a C++ translation failed to compile, we recorded the compiler output and did not proceed further with that translation (Figure 1). We reviewed the compiler output and categorized each
error as shown in Table 2.
I have a C++ agent to fix C++ compilation errors, and Iām sure there are much more powerful tools for this.
I see several issues with translating Fortran to C++. The most fundamental is that Fortran has concepts and capabilities that C++ lacks (of course, the reverse is true too). A significant example is single-program, multiple-data (SPMD) parallelism with a partitioned global address space (PGAS) that both work in distributed memory. The closest analogous C++ concept might be multithreaded programming, but that only works in shared memory and one would need to fork all threads at the beginning of execution, handle several setup tasks (e.g., establishing non-allocatable coarrays), not join the threads until the end of execution, and prevent the spawning of additional threads by individual loops ā and thatās just a small sampling of the issues that would need to be addressed. One could translate the SPMD and PGAS features to one-sided MPI, but thatās going to be challenging to get right, less readable, and is likely to be hurt performance.
Then thereās a lot of information loss involved. Think about the long list of constraints that apply to pure
procedures. Unless thereās a similar C++ concept, the reader of the translated code will have to read through each translated pure
procedure to rediscover all the information that the single keyword pure
provides in one fell swoop. Such rediscovery becomes especially important if the procedure gets called inside a parallel loop when translated to C++. By contrast, every procedure called inside Fortranās do concurrent
construct must be pure
according to the Fortran standard.
Moreover, even though C++ now has multidimensional arrays, C++ still lacks array statements as far as I know. So are all array statements being converted to nested loops? If so, there again is a loss of information unless those are C++ parallel_for
loops in order to retain the information that there is no implied ordering of iterations. Even then, itās likely to lead to code bloat wherein what was one line in Fortran could become many more lines in C++.
And the languages are only continuing to diverge. Fortran 2028 templates will be type-safe ā something that is not easily expressible in C++ because C++ doesnāt allow for specifying template requirements (relationships between types, procedures, and combinations thereof) so the loss of information could also lead to a loss in type safety if Fortran programmers take full advantage of the upcoming template feature.
Iāve only scratched the surface above. How about the fact that C++ allows overloading operators but does not facilitate user definitions of new operators. A common response is that user-defined operators are syntactic sugar, but such statements ignore the additional semantic constraints involved such as the requirement that the operands have the intent(in)
property. Thatās information the reader immediately knows when seeing the use of a user-defined operator in Fortran, whereas one would have to inspect the signature of every C++ function that replaces a Fortran operator to discover this same information about the arguments. And then thereās the argument that syntactic sugar can be exceptionally powerful in its communicative value.
Thereās so much more that can be said about such topics as the differences between Fortran pointers and C++ pointers, e.g., target
communicates important information to both the reader and the compiler. How will this information be communicated to the C++ compiler or developer?
Bottom line: the two languages are equivalent only in a superficial way that ignores a lot and accepts a considerable about of information loss, extreme restrictions, code bloat, and potential loss of safety and performance.
I struggle to comprehend how LLMs, trained on background that is likely not relevant to the context at hand, are superior to using inductive logic programming for specification recovery.
This was an ancient line of CS research when logic and rigor were considered important (they donāt seem to be very important today, given the claims people with tech sophistication accept).
Clearly, specification recovery is less precise in the sense of being able to verify a program meets its spec, than a program derived from a spec from first principles, as it is only an approximation, based on observable behavior of the program based on finite input. But I canāt see how anyone can put more confidence (in the sense that frequentist statisticians use it) in any program transform derived from an LLM.
I would say that they are diametrically opposed. Fortran was always about computation, C was always about hardware control. In Fortran, the states that the machine goes through between I/O are unobservable. In C, they are of supreme importance because observable hardware is being changed.