AI Coding Assistants vs. Codee — Insights on Fortran Correctness and Modernization

Hi everyone,

At Codee, we recently shared two comparative analyses exploring how current AI coding assistants perform on Fortran code, compared to Codee’s compiler-based tools.

1 - Code Formatting

We assessed how the Codee Formatter and AI assistants like ChatGPT, Claude, and Gemini handle the modernization of legacy Fortran 77 code, focusing on improving formatting and readability:

  • AI assistants often struggled with large source files, sometimes introducing unintended semantic changes or breaking compilation.
  • In contrast, the Codee Formatter processed Fortran files almost instantly, ensuring the original logic and structure were preserved by relying on its compiler-based technology.

Read the full article for more details: “Codee Formatter vs. AI Coding Assistants: A Focus on Fortran Modernization”.

2 - Performance Optimization

We also reviewed findings from the paper “Comprehensive Evaluation of LLMs in HPC Code Performance Optimization” by B. Cui, T. Ramesh, and K. Zhou (George Mason University) and O. Hernandez (Oak Ridge National Laboratory). The authors compared the Codee Analyzer with AI assistants such as ChatGPT, Claude, and Llama in HPC code optimization:

  • AI assistants were able to suggest meaningful optimizations and achieve performance speedups. However, they also failed in several benchmarks, producing code that failed to compile, crashed, or even generated incorrect results.
  • On the other hand, the deterministic static analysis of the Codee Analyzer consistently generated correct and compilable optimizations.

Read the full article for more details: “Codee Analyzer vs. AI Coding Assistants: A Focus on Correctness in Fortran/C/C++”.

As a general takeaway, AI assistants are valuable for creative and exploratory tasks, such as prototyping new code. However, when code correctness and reproducibility are essential, such as in scientific computing, they can pose risks if not carefully supervised by experienced developers. That’s where deterministic, compiler-grade tools remain a reliable foundation for development workflows.

We’d be interested to hear your thoughts:

  • How do you see AI assistants fitting into Fortran development?
  • Have you tried using AI tools for Fortran development?

— The Codee Team

3 Likes

Isn’t this exactly what one would expect? LLM’s are not reasoning, they are “stochastic parrots” so unless trained exclusively on Fortran specific data, one would expect all sorts of nonsense to get mixed in. That they work at all is the amazing thing; that they cheerfully produce crap isn’t.

Another very public example, chatGPT gets the law wrong

2 Likes

LLM’s are not reasoning, they are “stochastic parrots” so unless trained exclusively on Fortran specific data, one would expect all sorts of nonsense to get mixed in.

Indeed! Given the rapid adoption of AI coding assistants in recent years, our goal was to just contextualize the continued importance of deterministic and specialized coding tools to assist developers in certain activities.

On that note, has anyone seen any ongoing efforts to tune LLMs specifically for Fortran code?

Having retired (at least for now) I don’t have a sizable code base to train one on. There are enough self hosted LLMs that I’d have thought the usual players (government labs, etc.) would have plenty of code to play with.

The “obvious” next step is to tie the toolchains together, the LLM based assistant could propose, and the semantically aware toolchain could validate, and should be able to send it back for rewrite until it is at least acceptable. No doubt that still leaves room for terrible numerics (existing code bases are generally going to be standard floating point, rather than intervals or unums, so automated proofs of numerical equivalence are probably infeasible).

2 Likes