Is there a go-to code formatter for Fortran?

Hello Fortran community,

We at Codee have observed that the Fortran ecosystem lacks a widely adopted code formatting tool. While options like fprettify are available, none seem to have achieved a widespread adoption comparable to tools like clang-format for C/C++. For instance, clang-format has become the go-to tool in major projects (even in the Linux kernel), helping ensure consistent code across large codebases.

We would love to hear your thoughts on this topic:

  • Do you think a Fortran formatter could make a significant difference in the development experience?
  • Are you currently using any tools or workflows to format your Fortran code?
  • What features would you consider essential in a Fortran formatter (e.g., custom styling rules, editor/IDE integration)?

Thank you for your input!

2 Likes

I don’t see a single go-to code formatter for Fortran, and of course every user will have their own priorities. Our tool, fpt, necessarily contains formatting commands. So:

i. Do we think a formatter makes a significant difference in development? Mostly when we import or work on code from other users. We are already quite disciplined in writing our own code.

ii. Are we using tools - yes, fpt. We wrote it - silly if we didn’t!

iii. What are the essential features? The features we have implemented include:

a. Automatic switching between free and fixed format.

b. Control of indentation: A user-defined number of spaces for declaration and control constructs to a user-defined maximum depth.

c. Control of spacing between tokens - separate controls for space before or after each keyword, operator, delimiter etc. and control of spacing before or after symbols, labels, numbers…

d. Control of case - separate upper/lower/camel case for keywords, intrinsics, symbols, exponent characters, kind tags …

e. Control of line length with automatic generation of continuation lines at a user-defined length.

f. Recognition of commonly used extensions - INTEGER*4, STRUCTURE/MAP/UNION, VMS tab format …

We would welcome advice on any formatting features not listed above.

Best wishes, John.

1 Like

You go way beyond formatting, as does spag and LFortran in their ways as well. If I were to define formatting as changing the appearance of the code but not the restructuring of it I think common ones are changing .EQ. to == and vice-versa comes up a lot; on the farther edge is changing all the “end if” to “endif” and vice-versa and related changes – “go to” versus “goto” …; moving DATA declarations to above the executable statements; moving FORMAT statements to a group at the bottom or converting them all to CHARACTER parameters and using FMT=NAME instead of FMT=NUMBER (which I personally prefer myself but which seems implemented more rarely).

With longer line lengths allowed now it is sometimes preferable to join continued lines back into a single line, particularly if strings or (worse yet) variable names were split across lines.

I think (f) is covering changing REAL*8 to real(kind=real64) and maybe worse in some respects is real(8). REAL*8 either fails or gives you the expected behavior, but real(8) can point to the wrong kind. I see a big split in who likes the KIND= included. Some prefer real(int32) and others real(kind=int32); putting the :: into
declarations as in “real :: A” versus “real A”; and lining up the ::; putting all variables with like attributes in a single declaration versus one declaration per variable seems to be a big point of division.

The situation is that there is a big variation in preferences so the best tool lets every organization easily convert imported code they need to maintain to their favorite style.

That leads to comments being compatible with ford and doxygen. Some like all comments to start on a line by themselves beginning at the left margin; others like the beginning exclamation to line up with the beginning of the code below it; others prefer in-line comments on the ends of the statements; and some like those lined up and significantly spaced away from the end of the code.

Another twist on comments is whether they are all contained between the procedure statement and end or whether a description precedes any of the code,
but that would seem to be hard to automate recognizing.

I prefer all procedures to end with their names, like “end function myfunc”; others prefer a simple “end”. Same with interface blocks, etc. Some people like to name every DO loop and IF/ELSE/ENDIF; some like that just where it adds clarity to the code.

The one thing that is the same is that everyone’s preferences seem to be different.

PS: Mentioning “GO-TO” and Fortran in the same breath brings up bad images, I am afraid. Maybe “preferred” or “standard” ? :slight_smile:

1 Like

Below are links to formatters. I think fprettify has the most stars at 383.

Does fpt convert non-block (labeled) DO loops to block DO loops? I’ve tried various open source tools which purport to do this, but none has been entirely satisfactory. Been thinking of writing my own - just to add to the confusion…

Thank you @urbanjost

fpt already converts between e.g. REAL*8, REAL(8) and REAL(kr8) (or whatever you choose for the 8-byte REAL kind). It converts between e.g. .EQ. and ==. It annotates END statements for sub-programs and derived types with the sub-program or derived type name. Also it already removes labelled FORMAT statements with fewer than a criterion number of references by embedding the format string explicitly. I like your solution of defining a parameter with the format string and will implement it. We are currently working on code to write declarations for each object on a single line with all of the attributes. This is slightly complicated because of inter-dependent attributes.

We do handle header comments differently from comments within the code, but do not (yet) move them before the sub-program statement. I will look at the requirements for ford and doxygen and see what can be done.

We do not (yet) convert between e.g. GOTO and GO TO. Easy to do if required.

An extension which I didn’t mention in my first post is the use of cpp macros in the code. These have to be formatted correctly as well and we believe we have handled this.

@wspector - Does fpt convert labelled DO loops to block DO loops? - yes it does. Please see: Modernising Fortran
It gets rid of a lot of statement labels.

1 Like

Kinda interesting that you support datapools. That is a fairly obscure extension. I had to deal with them on a project about 20 years ago. Harris variant, not Gould-SEL. I converted them to F90 modules, rather than common blocks.

This needs to be reversible too. To give an example of this kind of use, suppose there is a git repository that several groups are working on, and each group, or maybe even each individual programmer, has some preferences about style. When one programmer modifies a code and commits it back to the repository, both the essential changes and the trivial style changes appear in a diff of the source code. After a few modifications by several groups, the whole source code appears to have changed, with the small number of essential changes mixed in and hidden with all of the trivial changes.

Is there a formatting tool that can be incorporated into the “git diff” command that filters out the trivial changes and shows only the essential changes?

1 Like

That reminds me of how often git changes are caused by white space, like spaces on the end of the line or tabs that can appear easily depending on what editor is being used or other minor reasons. We solved that by making a check-in run the code through a program similar to expand and dos2unix and trimming of training spaces as a part of being checked in.

So maybe something like that, but running it through a formatter with a standard set of switches would eliminate some of that.

But we see that t oo, where someone just “tidying up” a code and changing all the “end if” to “endif” and expanding tabs makes it hard to find the significant changes. We try to encourage checking in twice – once with the formatting changes and then with the significant changes because we see that same problem.

My tools list has

FF08Diff: command line tool for obtaining the semantic difference (difference in meaning, rather than appearance) between two sets of Fortran 2008 source files, written in Fortran 2003 by IanH

which I have not tried.

One of the reasons I use formats in character strings is because those can
be placed in a module main block and then used by all the procedures in
the module or even made public and accessed via USE. That might be harder to do in an automated fashion, and some want the formats close to the point of use and so might prefer it staying in the procedure; but you cannot put a labeled FORMAT in the main block so if a lot of routines share the same formats you have to duplicate them or put them in an INCLUDE file.

module M_
character(len=*),parameter :: f101='(*(g0,1x))'

contains

subroutine a()
   write(*,f101) 10,20.0,'30'
end subroutine a

subroutine b()
   write(*,f101) 11,22.2,'33'
end subroutine b

end module M_

program useit
use M_, only : a,b
   call a()
   call b()
end program useit

I use pre-commit that runs fprettify and a blank space checker and deleter Everytime I do a git commit.

In our large project in c++ we just have clang format to format everyone’s code to the agreed upon standard.

1 Like

@Beliavsky you might put lfortran fmt on the list too, it should format anything that we can parse. We recently reached 1,000 stars, which I am very happy about. :slight_smile:

@wspector - Converting DATAPOOL

The reason we converted DATAPOOL to COMMON, not to MODULE was historical. The first conversion was made in 1993 before Fortran 90 was widely accepted in the aerospace world where we then worked. However it was fortunate because:

i. We could easily map COMMON blocks between programs. I haven’t tried this with modules yet - perhaps others can comment?

ii. The Gould-SEL system has an interactive handler that will view and change values in DATAPOOL at run-time. We implemented this for the migrated DATAPOOL construct. Please see fpt Reference: BUILD ACCESS DATABASE

I think that if there is an interest in DATAPOOL we should start a new thread. Gould-SEL systems were used for a large number of training simulators for commercial and military aircraft, for military aircraft design simulators, power plant simulators and for instrument control - e.g. tracking radars. The original hardware has (I hope) died but the programs run in emulation of the MPX operating system at a surprising number of sites. Do any forum contributors have an interest in these?

fpt has a language intelligent command to compare the different versions of sub-programs. Please see: fpt Reference: Compare Versions
fpt reports whether:

  • header comments match. The header comments are the comments which precede or follow the sub-program statement and which precede the first declaration or executable statement;
  • comments match in the body of the sub-program;
  • declarations match. Note that two versions may be functionally identical even though variables are declared explicitly in one copy and are implicitly typed in the other;
  • names used for variables match. fpt recognises that two sub-programs may be functionally identical even though different names are used for variables which occupy the same COMMON block locations or which serve the same purpose. If the names and labels do not match but the code is functionally equivalent, a table of the corresponding names is shown;
  • data statements match;
  • executable code matches, at least in function. If the executable code does not match, fpt prints the first line of each routine where a difference is detected. If the token which does not match is a symbol, the symbol attributes are displayed.

I do not know whether this could be adapted to filter the behaviour of git or other version tracking tools. I would welcome advice!

I see that your site explains them. Someone who wants further information can start a new thread, as you say.

I see a good number of people switching from DATA statements, particularly when the values are actually constant parameters, to declarations when short lists of values are being set. I still see DATA being used in new code when the values are filling multi-dimensional arrays or using indexing values to specify the order of assignment of the data to the locations. But particularly for parameters (where there is no confusion about if the value is assigned at initial use or each time routine is called and whether it has the SAVE attribute is mute) something like

real,parameter :: pi=3.141592653589793238462643383279502884197

is better than

real pi
data pi/3.141592653589793238462643383279502884197/

although not exactly equivalent.

Just a caveat the stars might be misleading. My personal experience is findent(1) is the most popular; but that github is not the main repository for it (and so has no github stars); but that gitlab is; and that some apps are of course not in github or gitlab or anything similar but have their own sites.

1 Like

Is this something that should be avoided for modern Fortran? Every time I see a data statement I try to change it. I associate data statements with legacy code.