Our initiative to publish the "Fortran-lang Top 10 Recommendation for Fortran modernization", is it really new or even feasible?

Dear Fortran community,

At the Codee team we are actively working on a novel tool to help with modernization of Fortran codes. We believe this will reinforce the ecosystem of software development tools for Fortran.

Some time ago, we came across the OWASP community-led initiative, which regularly publishes a Top 10 list of secure coding recommendations. These guidelines are widely embraced by the development community, as they help in writing code free of security vulnerabilities. For example, OWASP has recently released the Top 10 recommendations for security in Large Language Models (LLMs), along with valuable resources such as the OWASP LLMs slides and the OWASP LLMs GitHub repository.

Beyond OWASP, there are open catalogs that provide collections of specific checkers (rules) to reinforce secure coding recommendations such as those by OWASP. These catalogs document relevant code patterns, example codes, detection tooling, and resources for further reading. Well-known examples in secure coding include SEI CERT C and CWE.

Drawing inspiration from these catalogs, our team created the Open Catalog recently announced in this community. This Open Catalog already features checkers addressing some common modernization challenges faced by Fortran developers, and we are committed to expanding it further by incorporating the valuable insights gained from our interactions with Fortran developers.

With this in mind, we thought that it might also be beneficial for the Fortran community to have its own ā€œTop 10 recommendations for Fortran Modernizationā€, similar to OWASPā€™s approach. This resource could benefit the community by providing a curated and easily accessible guide that highlights the key areas needing attention when working with legacy and old-style Fortran codebases.

Do you believe that starting this community effort would be valuable? If so, are you aware of any already existing initiatives we can collaborate with? We are very enthusiastic to be able to collaborate with all of you and create a ā€œFortran-lang Top 10 Recommendations for Fortran Modernizationā€ of its own.

Thank you on behalf of the entire Codee team!

10 Likes

Here are a few steps I try to follow when refactoring old codes:

  • require all procedures to have explicit interfaces, if possible by making them module procedures; this also applies to procedure dummy arguments
  • specify intents on all dummy arguments
  • remove common blocks and equivalence to prevent mistakes due to loss of type information or aliasing (see Making legacy Fortran code type safe through automated program transformation | The Journal of Supercomputing); how to replace these is context dependent, the simple path is to simply place the data in modules, the more complex way is to introduce a derived type
  • use a consistent and documented kind specifier for real types
  • replace obsolescent control flow constructs (arithmetic if, labeled do doops, various forms of go to) with the newer procedural ones (select case, cycle, exit, named blocks); this is not always straightforward, and the success using automated tools varies

The question remains how to perform this type of (automated) refactoring without introducing new bugs in the process?

By their nature, procedure arguments are coupled to the call site, meaning that changing a parameter list involves changes at all of the call sites. Moving the procedures to a named module also has this effect of requiring changes in calling codes, and canā€™t always be done without disruption (e.g. imagine putting BLAS procedures in a module). Still, providing explicit interface blocks can help detect errors.

Refactoring common blocks is also tedious business. For example in a project Iā€™ve refactored, one subroutine used:

C     .. Scalars in Common ..
      INTEGER           I10, I11, I19, I5, I9
C     .. Arrays in Common ..
      INTEGER           IA(3), IB(3), IC(2), ID(9)
C     .. Common blocks ..
      COMMON            /SCHSZ/IA, I5, IB, I9, I10, IC, I11, ID, I19

However a second routine used the same block with different variables:

C     .. Scalars in Common ..
      INTEGER           I10, I10A, I10B, I11, I11A, I11B, I12, I13, I14,
     *                  I15, I16, I17, I18, I19, I2, I3, I4, I5, I6, I7,
     *                  I8, I9
C     .. Common blocks ..
      COMMON            /SCHSZ/I2, I3, I4, I5, I6, I7, I8, I9, I10,
     *                  I10A, I10B, I11, I11A, I11B, I12, I13, I14, I15,
     *                  I16, I17, I18, I19

What I ended up doing as temporary step, before I can properly encapsulate these, was to place them in a module and equivalence them:

MODULE SCHSZ
IMPLICIT NONE

! COMMON /SCHSZ/
INTEGER :: I2, I3, I4, I5, I6, I7, I8, I9, I10, &
           I10A, I10B, I11, I11A, I11B, I12, I13, I14, I15, &
           I16, I17, I18, I19

! Aliases
INTEGER :: IA(3), IB(3), IC(2), ID(9)

EQUIVALENCE (IA(1),I2), (IA(2),I3), (IA(3),I4)
EQUIVALENCE (IB(1),I6), (IB(2),I7), (IB(3),I8)
EQUIVALENCE (IC(1),I10A), (IC(2),I10B)

EQUIVALENCE (ID(1),I11A)
EQUIVALENCE (ID(2),I11B)
EQUIVALENCE (ID(3),I12)
EQUIVALENCE (ID(4),I13)
EQUIVALENCE (ID(5),I14)
EQUIVALENCE (ID(6),I15)
EQUIVALENCE (ID(7),I16)
EQUIVALENCE (ID(8),I17)
EQUIVALENCE (ID(9),I18)

END MODULE

The variables belonging to common blocks were then replaced by imports from a module:

C COMMON /SCHSZ/
      USE SCHSZ, ONLY: 
     *    I2, I3, I4, I5, I6, I7, I8, I9, I10,
     *    I10A, I10B, I11, I11A, I11B, I12, I13, I14, I15, 
     *    I16, I17, I18, I19

The next step will be to replace the module with a derived type, thereby eliminating the global data which will finally allow the procedures to be made thread-safe.

3 Likes

NAG gives Fortran Modernisation workshops with detailed notes, which describe the ā€œpolishingā€ options of the NAG compiler, Reinhold Bader has a presentation on the same topic, and the Fortran Wiki has a page.

Before adding features from Fortran 2003 and beyond one can check a code with the g95 compiler with the options

g95 -Wall -Wextra -Wimplicit-none -Werror=100,113,115,137,146,147,159,163 -ftrace=full -fbounds-check -freal=nan -fmodule-private -Wno=112,167

If one of the -Werror options causes the code to not compile, an explanatory error message is provided.

One caveat about code modernization that needs to be stressed is that modernization can change the order of execution and optimizations performed that can lead to changes in the results compared to the original code. Usually these changes are in the last decimal point or two of precision so its up to the developer to decide if thatā€™s acceptable. If you need to exactly reproduce the results from the original code, you might be better off writing a Modern Fortran wrapper around the old code and only make the new wrapper visible to users. On the other hand, modernization can also reveal long standing bugs (usually by just introducing explicit typing) that have for some reason have gone unnoticed in the original code for decades. So basically your decisions should be based on the level of reproducibility of the original code results you need and the requirements to upgrade your code base to something that is more sustainable and manageable.

2 Likes

As far as I can see in the wikipedia, g95 is no longer maintained and has been ā€œreplacedā€ by gfortran:

The last stable version, 0.93, was released version, 0.93, was released in October 2012. Development of G95 stopped in 2013, and the compiler is no longer maintained.

GNU a part of GCC also known as gfortran, has now bypassed G95 in terms of its Fortran 2008 implementation and in the speed of the generated code. GNU Fortran was originally forked, in January 2003, from G95.

@Beliavsky why using g95 today? Does g95 has any features not implemented in gfortran?

In the g95 command you indicate, can we just replace it with gfortran or the info reported is not the same?

In fact, GFortran was forked from G95 in 2003:

G95 development was stopped in 2013. Its motto was exactly ā€œItā€™s free crunch timeā€¦ā€.
So G95 and GFortran diverged for 10 years.

The step-by-step process you describe to modernize legacy Fortran code in practice is dedinitely very useful and needs to be covered in the Top 10 recommendations. It is a tedious time-consumimg task that needs to be addresses through a divide-and-conquer aprroeach. I see refactoring work related to the procedures, including for example:

  • create modules to be placeholders of procedures
  • move procedures to modules to require all procedures to have explicit interfaces
  • specify intents on all dummy arguments

I also see refactoring work related to the data, including for example:

  • create modules for the common blocks
  • replace common blocks with use of modules
  • enforce the use of modules with the ONLY keyword

And additional refactorization work related to the calculations, including for example:

  • use a consistent and documented kind specifier for real types
  • replace obsolescent control flow constructs
  • replace modules with derived types

These are first thought to try to identify developer tasks related to procedures, data and calculations. For sure there are more things to consider.

@ivanpribec did I miss any important step of the process you described? Perhaps the equivalence construct?

We have added a command to fpt to change COMMON blocks to modules automatically. This was not trivial. You can see documentation at:
http://simconglobal.com/fpt_ref_change_common_to_module.html
This works with the current fpt Linux release, fpt_4.2-l, but we are making a small revision in the handling of implied DO loop iterators which should be out in the next few days.

The code changes are:

  1. The COMMON statements are deleted;

  2. The declaration statements of all variables in the COMMON block are moved to the target module;

  3. All data specifications (DATA statements and data in declarations) for the variables in the COMMON are moved to the target module;

  4. All parameters, type and structure declarations used in the COMMON block variable declarations and which are already in other modules are imported to the target module by USE ONLY constructs;

  5. The declarations of all iterators used in implied DO loops in DATA statements in the target module are copied to it;

  6. All parameter, type and structure declarations used in the COMMON block variable declarations which were local to the sub-programs which referenced the COMMON block are moved to the target module;

  7. A USE statement for the target module is inserted into every sub-program which referenced the COMMON block;

  8. Where parameter, type or structure declarations have been moved out of sub-programs which did not reference the COMMON block they are imported from the target module by USE ONLY constructs;

  9. The declarations written to the target module are ordered such that no object is referenced before it is declared;

  10. Name clashes which could be caused by these changes are resolved by rename constructs in the USE statements or, as a last resort, by renaming the objects in the modules.

  11. Where variables in two different COMMON blocks are initialised in the same implied DO loop in a DATA statement the COMMON blocks affected are combined into a single module, irrespective of the specifications made in the commands;

There is a test program for all of this in the examples directory of the fpt release. By all means download fpt (http://simconglobal.com) and try it. fpt is free for academic and non-industrial personal use.

If there are any more awkward cases which we have missed in the test program please tell us - I doubt that we have thought of everything!

3 Likes

As you said, g95 has not been maintained for some time, and in general people should use currently developed compilers such as gfortran. However, I donā€™t think that gfortran has all the diagnostics for Fortran 95 code that g95 does, with the options I gave, some of which gfortran does not recognize. So as I wrote previously, if you are trying to modernize Fortran 77 (or F90 or F95 for that matter), a reasonable first step is to get it to compile with g95 ā€œstrictā€ options. Then you can add features g95 does not support.

Here is an example. John Burkardt has many Fortran 90 codes, and his https://people.sc.fsu.edu/~jburkardt/f_src/asa058/asa058.f90 compiles with gfortran -c -std=f2018 -Wall -Wextra (once you move the declaration of j before that of k), but g95 -c -Wall -Wextra -Werror=163 fails because of missing argument INTENTs.

2 Likes

I earlier linked to a presentation by Reinhold Bader. His topics for modernization are

  • Compiler support for flagging non-standard, standard-level, obsolescent, or removed features (tools)

  • Fixed source form (6.3.3, B.3.7) and conversion tools

  • Non-standard notations for intrinsic types and type promotion by the compiler

  • CHARACTER* declaration (7.4.4.2, B.3.8)

  • Legacy notation for operators (10.1.5)

  • Legacy execution control:

    • Branching (11.2)

    • arithmetic IF (deleted)

    • computed GOTO (11.2.3)

    • assigned GOTO and ASSIGN (deleted)

    • non-block DO loop (deleted) and labeled DO loop (B.3.10)

    • non-integer loop control variable (deleted)

  • Legacy type concepts: SEQUENCE types (7.5.2.3) and (non-standard) record types

  • Procedures:

    • Implicit interfaces (15.4.2, 15,4,3,8) and external procedures

    • Arguments declared without INTENT (8.5.10)

    • Statement functions (15.6.4, B.3.4)

    • Alternate return arguments (15.6.2.7, B.3.2)

    • Assumed character length function result (B.3.6)

    • ENTRY statement (B.3.9)

  • Specific names for intrinsic functions (B.3.12)

  • COMMON blocks and their initialization with BLOCK DATA (B.3.11)

  • Enforcing storage association with EQUIVALENCE (B.3.11); replacement by appropriate POINTER entities, ALLOCATABLE entities, or the TRANSFER intrinsic subroutine (16.9.193)

  • Non-standard dynamic memory with Cray Pointers and its replacement by either C interoperability features or dynamic Fortran objects

  • I/O

    • Hollerith edit descriptor (deleted)

    • vertical format control (deleted)

    • PAUSE statement (deleted)

  • Array assignments with FORALL (B.3.13)

1 Like

Hello from the Codee team!

First, thank you for your feedback on our ā€œTop 10 Recommendations for Fortran Modernizationā€ initiative.

Weā€™ve been working on synthesizing the collective wisdom shared through your feedback (comments, NAG slides, Reinholdā€™s slides, Fortran Wiki), our discussions with other Fortran developers, and an exploration of other insightful discussions on Fortran Discourse and various resources such as:

With all this information, weā€™ve tried to summarize the most recurring and critical challenges and actions of Fortran developers regarding the modernization of Fortran code, which we would like to share with you. Some of these scenarios are already documented in the Open Catalog that we presented a few weeks ago (those with a PWR link), and naturally, we would also like to further document all other scenarios listed below in the catalog!

Our proposal of ā€œtop recommendationsā€ for Fortran modernization would be:

  • Use modules instead of common blocks to share data.

  • Prefer real(kind=kind_value) for declaring consistent floating types.

  • Consider using allocatable instead of a pointer.

  • Declare array procedure arguments as assumed-shape arrays.

  • Add a contiguous attribute to applicable assumed-shape arrays.

  • Use pointer or derived types rather than the equivalence statement.

  • Use case or if-then-else constructs instead of go to statements.

  • Replace arithmetic if statements with block if constructs.

  • Use do, cycle, and exit constructs instead of go to statements for loops.

  • Convert labeled do loops to non-labeled do loops.

  • Use only integer control variables in do loops.

  • Convert explicit do-loop to generate array into array notation.

  • Avoid alternate return statements.

  • Avoid using data or block data statements to initialize variables.

  • Replace forall statements with do concurrent.

Note: An initial entry (PWR063) in the Open Catalog already outlines several of the legacy Fortran features above. As discussed, we intend to further document all these new scenarios, offering concrete and precise information on how to address each of the legacy features.

  • PWR001: Use the keyword only to explicitly state what to import from a module.

  • PWR002: Declare scalar variables in the smallest possible scope.

  • PWR008: Declare the intent for each procedure argument.

  • Consider grouping a set of global variables into a module for controlling access interfaces.

  • PWR003: Explicitly declare pure functions.

  • PWR007: Always use implicit none to disable implicit declarations.

  • Explicitly declare elemental functions.

  • Add a parameter attribute to constant variables.

  • Add an explicit save attribute when initializing variables in their declaration.

  • Prefer Fortran intrinsics like MAXVAL or MATMUL over user code.

  • Encapsulate external procedures into modules to avoid implicit interfaces.

What are your thoughts on this list? Is there a particular item you feel is missing, or do some not seem that relevant? Which ones would you prioritize over others? Your insights and suggestions are invaluable to us, so weā€™re eager to hear your thoughts!

@Beliavsky Weā€™ve also taken note of your latest reply to update our proposal accordingly! Do you think that any of the topics not already covered in our list should be covered no matter what due to their relevance?

3 Likes

While I agree that whenever possible better to rely on the intrinsics, Iā€™ve had surprises with maxloc for some limit cases not returning the correct value (it was long time ago with ifort16) which led me to crafting a substitute function to get the correct value and consistent behavior with gfortranā€¦ and with matmul, if one does not link against mkl and overload the procedure it can actually be less performant if, say, you have to repeat many times multiplications of small matrices (3x3 or so) comparing to writing a small procedure. So, I would say to advise it with a grain of salt and always cross check.

2 Likes

The other issue is that because matmul() is written as a function, it typically requires some temporary array workspace. That allocation/deallocation requires some extra effort beyond what would be required with a subroutine interface (e.g. dgemm()), even if it can be done from the stack rather than the heap. And that allocation can sometimes overrun the available stack space, so even with perfectly legitimate and correct fortran code, the program can abort.

Some of this is simply a quality of implementation issue. If fortran compilers did a better job, then we should not have to worry about the performance or the stack overrun issues. But if you compare timings for the intrinsic matmul() vs. a tuned dgemm(), you will see typically a large factor.

1 Like

Thanks @hkvzjal @RonShepard for the heads up on the performance of intrinsics! Even if we focus the list on code ā€œmodernizationā€, itā€™s certainly worth pointing out these caveats, just to be safe :slight_smile:

Funny enough, the same day I replied here with the ā€œwarningā€ about intrinsics, a colleague of mine came to see me about a weird bug he was having in parallel where 2 out of 4 processes were crashing when building some arrays, and it came down to a problem with maxval on a 2D integer array not finding the true maximum, it was close but not enough. So the subsequent array accessing crashed because the size was smaller than the adresse being requested. (Using ifort19.1) ā€¦ we replaced the intrinsic with a hand-crafted replacement and problem goneā€¦

Iā€™m all in for code modernization, to promote readability and reusabulity, but given the plethora of compilers out there, the intrisics end up being facility functions and good starting points, rather than the strict best option.

2 Likes

Very often, this is forgotten. Upgrading the syntax is only part of the process, but to make the code reusable, often more changes are needed.

For instance some older Fortran libraries, have hardwired ā€œcallbackā€ functions (which really arenā€™t callbacks in the proper sense). This type of rigid program structure goes back to the punch-card era, when substituting a user-provided function meant switching a deck of cards. To make such programs truly reusable, the callback function needs to be passed as a procedure argument instead.

Legacy example:

C ODE solver routine
      subroutine odeslv(n,y,x,xend,info)
      dimension y(n)
C ...
C evaluate user-provided ODE function (name is hardwired!)
      call odefun(x,y,yp)
C ...
      end subroutine

Modern equivalent (with details omitted):

abstract interface
   subroutine p_odefun(x,y,yp)
     real, intent(in) :: x, y(:)
     real, intent(out) :: yp(:)
   end subroutine
end interface

subroutine odesolve(self,x,xend,y,odefun)
   class(ode_solver), intent(inout) :: self
   real(dp), intent(inout) :: x, y(:)
   real(dp), intent(in) :: xend
   procedure(p_odefun) :: odefun
! ...
   call odefun(x,y,yp)
! ...
end subroutine
2 Likes

I have always wondered why there isnā€™t a better name for this process. This technique dates back to before f77 in fortran, in which case the ā€œbackā€ part of the terminology is misleading. With modern fortran, with contained procedures, one might argue that the term ā€œbackā€ has some relevance, but even now that is still probably only a small minority of cases compared to the normal case.

Top few for me:

1.) Always use Standard conforming code. Turn on all warnings (e.g., -std=f2018 -Wall with gfortran) and fix any issues by using Standard conforming code. There are really very few compiler extensions from the Olden Days that do not have modern, Standard conforming, replacements.

2.) Always use IMPLICIT NONE everywhere. It is amazing how many bugs this can find and avoid compared to the default typing rules.

3.) All subprograms should be CONTAINed. Generally in modules, but also in the main program unit. If the subprograms are in individual files, use INCLUDEs in a module to compile them together. Again, amazing how many interface bugs show up when this is enforced.

4.) All old COMMON blocks should be in modules, as with their respective DATA initializations from any BLOCK DATA. Note that COMMON is allowed inside modules, so the conversion process can be gradual. This can be important when dealing with name changes.

5.) Use free-form source instead of fixed form. On most source files, it only takes a couple well-chosen text editor commands to do the transformation. Though sometimes there are significant blank issues.

Many many more could be suggested. Here are a few in no specific order that help compilers find more bugs at compile time, and help programs scale better:

  • Always specify intent attributes for dummy arguments.

  • Always use assumed shape for array dummy arguments. Perhaps with the CONTIGUOUS attribute.

  • Eliminate GOTOs as much as practical. Remember that, unlike C, Fortran has multi-level CYCLE and EXIT for loops. Also since F2008, you can use BLOCK constructs and EXIT out of them in a structured fashion. Goal is no numeric statement labels (except possible alternate returns for exception handling, and FORMAT statements for I/O.)

  • Use ALLOCATABLEs whenever practical, instead of fixed-size arrays that are ā€œbig enoughā€. Same with character string lengths.

  • Use F2008 submodules to separate interface from implementation. Allows for faster compilation by users of the module, and avoidance of ā€˜compilation cascadesā€™.

5 Likes

I know this has been a somewhat of a sensitive topic in the past, but is there consensus on setting up the build system to use the compiler flag --fimplicit-none by default? That is a gfortran compiler flag and I presume other compilers have a similar feature. The use of implicit none is similar to the recommendation to always have the lines use strict and use warnings in Perl code.

I donā€™t see any reason why this would be a problem if you were starting a new project that had no legacy F77 code.

@wspector co-authored the 2011 book Modern Fortran: Style and Usage.

[It] is a book for anyone who uses Fortran, from the novice learner to the advanced expert. It describes best practices for programmers, scientists, engineers, computer scientists, and researchers who want to apply good style and incorporate rigorous usage in their own Fortran code or to establish guidelines for a team project. The presentation concentrates primarily on the characteristics of Fortran 2003, while also describing methods in Fortran 90/95 and valuable new features in Fortran 2008. The authors draw on more than a half century of experience writing production Fortran code to present clear succinct guidelines on formatting, naming, documenting, programming, and packaging conventions and various programming paradigms such as parallel processing (including OpenMP, MPI, and coarrays), OOP, generic programming, and C language interoperability. Programmers working with legacy code will especially appreciate the section on updating old programs.

4 Likes