Normalizing Fortran code

I have been working on transpilers to and from Fortran. When transpiling from Fortran you notice that there are many ways to write the same program, so it makes sense to transpile general Fortran to “normalized” Fortran and then transpile that. Here is a partial list of how Fortran code can be normalized. Suggestions are welcome.

  • fixed source form can be converted to free
  • code can indented with specified rules and made lower case
  • either single or double quotes can be made the default for quoting character strings
  • implicit typing can be made explicit
  • print can be replaced by write
  • specifiers can be named, writing write (unit=, fmt=*) instead of write(*,*) , real(kind=dp) instead of real(dp) etc.
  • one-line if statements can be replaced with an if block . I have found that one-line if statements can be hard for transpilers to handle.
  • real, dimension(n) :: x can be replaced by real :: x(n)
  • :: can be added to any declaration without it
  • each variable can be declared on a separate line, allocated on a separate line, and deallocated on a separate line
  • real function f(x) can be replaced with function(x) result(y) . There are many other ways to declare the same function
  • in a procedure, real :: n = 0 can be replaced by real, save :: n = 0 This should be done regardless of whether code will be transpiled
  • padded character variables in an array constructor [”a “, “ab “, “abc”] can be replaced by [character (len=3) :: “a”, “ab”, “abc”]
  • lines with ; to have multiple statements can be broken up
  • A bare end can be replaced with end function foo, end subroutine foo, end module foo, or end program foo.

More substantively, common can be replaced by module variables. A little-tested script for this is here.

I am not saying that code should be written in the ways described above. Some of the replacements make code more verbose. But for translation, automatic generation of Fortran code, and comparing code, a tool to normalize code can be useful.

I’d like to see a really robust open source tool for converting DO loops into block form to better conform to Fortran 2018. The tools I’ve tried tend to be buggy. And/or they do a lot of superfluous modification to the code - making diffing the before and after versions difficult.

1 Like

If you could provide examples of the conversions you want to see I will try to create a tool that does them.

I’ve been using the netlib version of slatec (https://netlib.org/slatec/) as an acid test on various tools. (Note that understanding fixed-form source is not an issue. I’ve written a small script, mostly using sed, to convert the SLATEC source to free-form.)

It would be nice to do something about EQUIVALENCE but I don’t see any bullet-proof way of doing it as long as INTEGERs can be equivalenced to REALs etc which makes using POINTERS (and I guess ASSOCIATE) difficult and probably impossible. I’m also tempted to add Hollerith data to this. As long as its in FORMAT statements its somewhat straight forward to get rid of it. The big issue is when you are storing Hollerith data in a REAL array and THEN trying to use that array like you would a character string as a FORMAT specifier in an I/O statement (something I recently encounterd in an old NASA code targeted at IBM G an H compilers circa 1968)

I suspect there’s a typo in here. Instead of function(x) result(y) did you mean:

function f(x) result(y)
real :: y

Yes, thanks.

Nice job.
To add some confusion, the previous function example can also be written

real function f(x) result(y)

But I tend not to use that form since it is limited (allocatable and pointers cannot be specified this way).

Some time ago, I also tried to ‘normalize’ some code using lfortran. You can parse the code to AST/ASR and back to fortran90. That could be worth comparing the two.

It occurs to me that if the goal of normalizing code is to ease translation to another language, the target language affects how the code should be normalized. When translating Fortran to Python/NumPy, you want the Fortran code to use array operations, which typically have equivalents in NumPy, rather than loops, which are slow in Python. If the target is C, code with loops rather than array operations is preferable, since C has only loops. I earlier presented tools to translate Fortran code with array operations to code with loops, and back: Tool to translate from loops to array operations .

Transpilers to Fortran are better – I have been working on them :slight_smile: .

You may find this useful, written by an old colleague : https://dl.acm.org/doi/10.1145/1058176.1058178

2 Likes

As a follow up a reasonable subset of the Nag Tools is now available as a ‘polish’ option with the Nag compiler. Most of the examples in our books use the Nag polish option to give the examples a standard look and feel.

fpt (https://simconglobal.com) will make many of the suggested changes. We are currently working on coalescing declarations so that type, bounds, data, public/private and other attributes are written in a single statement.

Issues with EQUIVALENCE depend on what it is used for. We use it to set up a paged data structure so that different tables may be overplayed in the same memory. I can’t yet see a way that could be reprogrammed.

It has long puzzled me why we can’t have a single character power character, by replacing “**” with “^” ?
Surely this would be a simpler operator for both code syntax and for reading ?
Do other languages use ^ ?

Julia uses ^. But in the end ** is still shorter than Math.Pow(x, 2) in C#.

Could we blame it on IBM keyboards as usual?

(The use of .lt., .le., etc., (/, /), and even some digraphs in C, was likely influenced by the lack of <, >, [, ], { and } in certain keyboards at the time.)

But if you have a numeric keypad, then by using **, all the arithmetic operators are there.

And btw, most programming languages that use ^ as the bitwise XOR operator, and also happen to have a power operator, use ** (e.g., Perl, PHP, Python, Ruby, JavaScript , etc.).

2 Likes

I totally agree with ^ just not being on most keyboards and not part of the
“Fortran Character Set” as a consequence being the reason. A little more
history for the curious:

The FORTRAN 77 standard character set, defined by ANSI X3.9-1978,
consists of 49 characters used for language syntax and symbolic names.
The Fortran subset only allowed 47, but the (optional) full standard
added $ (Currency Symbol/Dollar Sign) and : (Colon). So even using
double-quotes around a string was an extension!

& is NOT in the set, which technically means creating INCLUDE files
with continued lines that can be used with fixed and free format files
is NOT ANSI-77 (but is standard using today’s’ fixed format style).

A lot of machines did not support lowercase, but even if it did people often preferred SHOUT mode (all uppercase) because the screen resolution was typically so poor lowercase was harder to read. Roman uppercase characters, partly designed for ease of carving in stone, worked well on low-resolution
monitors (the Romans really did not use lowercase or spaces for that
matter for most of the history of the Classic Roman Empire either!)

Hard to imagine nowadays that @, :, #, &, <, >, | were not even on most
keyboards. ** may seem odd for powers, but * was odd for multiplication
back then for many people coming from doing most computation by hand,
and most people would have preferred a ÷ character for division, but
most keyboards still do not have that one, and subscript and superscript
never got too far either. A lot of folks missed Greek letters. And most
everyone doing things by hand used single-character variable names so
handwritten AB obviously meant multiply A by B, but radical new Fortran
insisted on allowing multiple-letter variable names, which meant giving
up that cherished standard :slight_smile:

So ** seems odd to someone used to ^ for powers, but both were weird
to anyone schooled on handwritten notation. We have come a long way if most
people think “a*b/c**pi” or “a*b/c^pi” is intelligible and that maybe

              π
    (a ⋅b) ÷ 2

looks odd?

F77 Character Set Definition

X3.9-1978 FORTRAN 77

3.1 FORTRAN Character Set

   The FORTRAN character set consists of twenty-six letters,
   ten digits, and eleven special characters.

   3.1.1 Letters. A letter is one of the twenty-six
   characters :

       ABCDEFGHI JKLMNOPQRSTUVWXYZ

   3.1.2 Digits. A digit is one of the ten characters:

       0123456789

   A string of digits is interpreted in the decimal base number
   system when a numeric interpretation is appropriate.

   3.1-3 Alphanumeric Characters. An alphanumeric character
   is a letter or a digit.

   3.1.4 Special Characters. A special character is one of
   the eleven characters:

    Character Name of Character
              Blank
    =         Equals
    +         Plus
    -         Minus
    *         Asterisk
    /         Slash
    (         Left Parenthesis
    )         Right Parenthesis
    ,         Comma
    .         Decimal Point
    '         Apostrophe
1 Like

Fortran 66 only had 47 characters. Currency ($) was included, though not used for anything other than inside a Hollerith constant. Fortran 77 added apostrophe (') and colon.

Interestingly, the CDC FTN ('66 compatilble) and FTN5 ('77 compatible) compilers allowed double quotes (") as an extension to delimit a string of characters. But it was considered a Hollerith constant (e.g., word-oriented left-justified blank-filled), rather than character data type.

^ was used as a string delimiter on some (very) legacy systems - I think CDC.

The main place one would see circumflex (^) used in CDC Fortran was with the KRONOS and NOS time-sharing sub-systems - TELEX and IAF. In ‘6/12’ DISPLAY code, preceding an alphabetic character with a circumflex would indicate the following letter was lower-case. There were other situations where it was used for a character that was not part of the 6-bit character set as well.

I’m looking at an old code circa 1975 that used & to prefix the statement label for an alternate return argument (actual argument). The compiler used for the code was IBM (H i think) so using & must be an IBM extension. Note the associated dummy argument is an asterisk. Some Intel online documenation says that both * and & can be used to prefix the actual return argument for the Intel compilers.