Legacy Code Modernization: global common, equivalence out; modules, namelists in?

Machalot · April 18, 2024, 6:50am

I’ve got a code modernization puzzle. I’m working with a large legacy code and trying to modernize it incrementally. The code has a custom command line interface to set the value of any global variable by name. For example, a command line input to set the value of mass to 60 looks like this:

> mass 60

Since Fortran doesn’t have introspection, this is achieved via a bunch of claptrap, namely a huge global array, an external data dictionary that lists all global variable names and their corresponding indices in the global array, and thousands of equivalence statements pairing global array elements with variable names within procedures. For example, in every procedure where mass appears, it has equivalence( global_array(658), mass ), along with a declaration real mass (and a dimension mass(n) statement in the case of arrays). Since each of these is written manually in hundreds of different files, it’s very error prone and fragile, and it doesn’t protect common data that shouldn’t be changed, and it’s a huge task to maintain the data dictionary

I had the idea to replace this with a design using modules to share the global variables with use, only:, and a namelist to provide the command line capability. (Assume for this conversation the command line must be preserved.)

Move all global variable names into a module and delete the global arrays.
Replace all equivalence statements with use, only:
Provide command line access using a namelist and internal read

With these changes, all global variables will need be declared and dimensioned just once, and added to a namelist once, hopefully in the same module file and even on the same line. Anywhere they need to be used, they are brought in explicitly by name with use, only:.

I got this design concept working in a small example with all the globals and namelist in a single module. But it won’t scale up, because this code has hundreds of files, and I don’t want to rebuild the entire code every time I touch the global module. Instead I want to split it into a smaller topical modules, whose namelists are public and in turn are used by the command line module. Since multiple namelist statements referring to the same group just continue adding to the same group, ideally I would like to build up a single namelist group from all these separate modules. But gfortran doesn’t combine namelist groups with the same name when they are accessed by use-association.

The following code is my proof of concept, but it doesn’t compile because the globals namelist group is an ambiguous reference brought in from both topical_module_1 and topical_module_2 separately, rather than combining to a single group. The code compiles if you uncomment the lines that rename the global namelists to g1 and g2, uncomment the corresponding read statements, while commenting out the references to globals. (As an aside, it’s interesting that a renamed namelist group still reads in using its original name, i.e. a read statement with the renamed group nml=g1 reads a namelist starting with &globals.)

Is it impossible to create a single namelist group with statements distributed across program units? The documentation I could find it not really clear on this.

Is there a better way to modernize the old code?

module topical_module_1
implicit none
private

real, public :: x1
public :: globals

namelist / globals / x1

end module topical_module_1

module topical_module_2
implicit none
private

real, public :: x2
public :: globals

namelist / globals / x2

end module topical_module_2

module command_line_module
use topical_module_1, only: globals
use topical_module_2, only: globals
!use topical_module_1, only: g1 => globals
!use topical_module_2, only: g2 => globals

implicit none
private

public :: read_user_input
public :: print_values

contains

subroutine read_user_input
use iso_fortran_env, only: error_unit
implicit none

character(256) :: input
character(:), allocatable :: morsel
character(:), allocatable :: nmlstring
character(256) :: iomessage
integer :: iostatus

write(*,'(a)',advance='no') '> '
read(*,'(a)') input
morsel = adjustl(trim(input))
write(*,'(2a)') '>> ',morsel

nmlstring = '&globals '//trim(morsel)//' /'

read(nmlstring,nml=globals,iostat=iostatus,iomsg=iomessage)
if (iostatus /= 0) then
  write(error_unit,'("iostatus ",g0,": ",2(a,/))') iostatus, trim(iomessage), trim(nmlstring)
end if

!read(nmlstring,nml=g1,iostat=iostatus,iomsg=iomessage)
!if (iostatus /= 0) then
!  write(error_unit,'("g1 iostatus ",g0,": ",2(a,/))') iostatus, trim(iomessage), trim(nmlstring)
!end if
!
!read(nmlstring,nml=g2,iostat=iostatus,iomsg=iomessage)
!if (iostatus /= 0) then
!  write(error_unit,'("g2 iostatus ",g0,": ",2(a,/))') iostatus, trim(iomessage), trim(nmlstring)
!end if

deallocate(morsel)
deallocate(nmlstring)

end subroutine read_user_input

subroutine print_values
  use topical_module_1, only: x1
  use topical_module_2, only: x2
  implicit none

  write(*,*) 'x1 = ',x1
  write(*,*) 'x2 = ',x2
end subroutine print_values

end module command_line_module

! main program
use command_line_module, only: read_user_input
use command_line_module, only: print_values
implicit none

call print_values

do
  call read_user_input
  call print_values
end do

end

Arjen · April 18, 2024, 4:56pm

You might want to look at my keyvars module at flibs - a collection of Fortran modules / SVN / [r429] /trunk/src/strings/keyvars.f90 and the test/demo program at flibs - a collection of Fortran modules / SVN / [r429] /trunk/tests/strings/test_keyvars.f90.

It does not use namelists, but instead it registers the variables that are to be managed via a string and a pointer to the variable. The code is meant for reading INI-files, but it should be easy to adapt it for reading from command prompts.

Here is an excerpt of the demo program:

    integer :: x, x2, y2, x3
    real    :: y
    character(len=20) :: string

    x = -1
    y = -1.0

    call get_values( 'test_keyvars.inp', [keyvar("int",  x, "Integer value"), &
                                          keyvar("real", y, "Real value"), &
                                          keyvar("char", string, "Some text"), &
                                          keyvar("Section", "int2", x2, "Alternative parameter for x"), &
                                          keyvar("Section3", "int3", x3, "Yet another one"), &
                                          keyvar("Section", "inty2", y2, "Alternative parameter for y")] )

    write(*,*) 'x = ', x
    write(*,*) 'y = ', y
    write(*,*) 'string = ', string

ivanpribec · April 18, 2024, 6:11pm

Wouldn’t the variables need to have the target attribute in that case?

Machalot · April 18, 2024, 9:23pm

Thanks for the suggestion. I didn’t find test_keyvars.inp in the repository. Can you point me to it?

urbanjost · April 18, 2024, 10:35pm

There might be some ideas here, showing using user-defined types to group related variables; additionally it shows NAMELIST groups to provide an interactive input mode including reading and writing values to a file, temporarily entering a command subshell … .

Portable Interactive NAMELIST group

module topical_properties
   implicit none
   private
   type props
      real :: mass=0.0
      real :: volume=0.0
   end type props

   public :: props

end module topical_properties

module topical_environment
   implicit none
   private
   type env
      real :: air_temp=0.0
      real :: pressure=0.0
   end type env

   public :: env

end module topical_environment

module aggregate
   use topical_properties, only: props
   use topical_environment, only: env
   implicit none
   private
   type(props),public :: p
   type(env),public   :: e
   namelist /globals/ p
   namelist /globals/ e
   public globals
end module aggregate

module command_line_module
   use aggregate, only: globals
   implicit none
   private
   public :: read_user_input

contains

subroutine read_user_input(status)
   character(len=:), intent(out), allocatable :: status
   character(len=256) :: line
   character(len=256) :: answer
   integer            :: ios
   integer            :: lun
   status = ''
   write (*, '(a)') [character(len=80) :: &
      'READ MODE (examine and change values):               ', &
      '  "."           return to main program               ', &
      '  NAME=VALUE    change values                        ', &
      '  show          show current globals values          ', &
      '  read          read globals from a NAMELIST file    ', &
      '  write         write globals to a NAMELIST file     ', &
      '  sh            start subshell. use "exit" to return.', &
      '  stop          tell caller to stop                  ', &
      '']
   do
      write (*, '(a)', advance='no') 'read_mode>>'
      read (*, '(a)') line
      if (line(1:1) .eq. '!') cycle
      select case (line)
      case ('.')
         exit
      case ('show')
         write (*, *) 'SO FAR'
         write (*, globals)
      case ('stop','quit','adios', 'adieu', 'arrivederci', 'au revior', 'so long', 'sayonara', 'auf wiedersehen','cheerio')
         status = 'stop'
         exit
      case ('sh')
         call execute_command_line('bash')
      case ('read')
         write (*, '(a)', advance='no') 'filename:'
         read (*, '(a)', iostat=ios) answer
         if (ios .ne. 0) exit
         open (file=answer, iostat=ios, newunit=lun)
         if (ios .ne. 0) exit
         read (lun, globals, iostat=ios)
         close (unit=lun, iostat=ios)
      case ('write')
         write (*, '(a)', advance='no') 'filename:'
         read (*, '(a)', iostat=ios) answer
         if (ios .ne. 0) exit
         open (file=answer, iostat=ios, newunit=lun)
         if (ios .ne. 0) exit
         write (lun, globals, iostat=ios)
         close (unit=lun, iostat=ios)

      case default
         UPDATE: block
            character(len=:), allocatable :: intmp
            character(len=256)  :: message
            integer :: ios
            intmp = '&globals '//trim(line)//'/'
            read (intmp, nml=globals, iostat=ios, iomsg=message)
            if (ios .ne. 0) then
               write (*, *) 'ERROR:', trim(message)
            end if
         end block UPDATE
      end select
   end do
   end subroutine read_user_input


end module command_line_module

program main
use command_line_module, only: read_user_input
implicit none
character(len=:), allocatable :: status
do
   call read_user_input(status) ! interactively change NAMELIST group
   if (status .eq. 'stop') exit
   call print_values
end do
contains
   subroutine print_values
      use aggregate, only : p,e, globals
      write (*, globals)
      write (*,*)'PROPERTIES:',p
      write (*,*)'ENVIRONMENT:',e
      write (*,*)P%MASS, P%VOLUME, E%AIR_TEMP, E%PRESSURE
   end subroutine print_values

end program main

Now that NAMELIST supports reading from internal files a decent interactive mode is possible using a relatively few lines. Without the demand for subgrouping the variables it is even simple, as the Fortran Wiki example on a NAMELIST-based prompter shows, which this is based on.

Using the user-defined types eliminates some of the dusty corners about the NAMELIST usage you were trying. There are several nice interactive mode extensions provided by several compilers like the “?” input line in gfortran, but this shows creating an interactive mode using only standard-conforming NAMELIST features.

JohnCampbell · April 19, 2024, 4:04am

The more you change legacy code, the more you have to derive a test suite to cover the range of cases to be checked.
You need to investigate what test cases and documented solutions are available before making any changes.
A minimal set of changes should first be attempted.

Arjen · April 19, 2024, 5:25am

Ah, it is indeed not in the repository. I have added a copy to this message (rename it, as Discourse does not allow an extension .inp).
test_keyvars.inp.txt (232 Bytes)

Arjen · April 19, 2024, 5:27am

Yes, but that is arranged in the keyvar_* functions (target attribute on the argument). I am not sure it is completely standard-compliant, but I have never had any problems with the variables not being target themselves.

Machalot · April 19, 2024, 5:30am

Certainly.

There are costs and benefits to modernizing old code. In a similar piece of software from the same era, we successfully automated the conversion of common blocks into modules and achieved identical output at double precision in our test suite. That modernization has allowed us to extend the useful lifetime of that software by at least two decades and add a whole range of new capabilities.

RonShepard · April 19, 2024, 5:59am

Regarding the TARGET attribute on the actual argument:

If you do not have the TARGET attribute, then the compiler is allowed to make a local copy to use as the actual argument, in which case the final pointer assignment to the dummy argument would not be correct. The standardese way of saying that is that the pointer would be undefined upon return from the subroutine.

As for whether it works without the TARGET attribute, it will work until it doesn’t. If you, the programmer, want to guarantee that it works for all combinations of compilers and compiler options and optimization levels, then you must use the TARGET attribute for the actual argument.

Arjen · April 19, 2024, 7:07am

Ah, that sounds quite reasonable. I will have to update the code and the documentation. Thanks!

Jcollins · April 19, 2024, 9:45am

@Machalot

We implemented an automatic system to generate the tables for access to variables by name about 25 years ago. It is still supported, and you will find documentation at
http://simconglobal.com/fpt_ref_build_access_database.html
The system doesn’t use EQUIVALENCE statements to access the variables and array elements. It uses the COMMON block addresses (taking into account the compiler alignment rules). There is a command environment to read or write variables, array elements and to examine array sections. However, we never implemented access to derived types or structures. The system is in current use in several aerospace simulations and instrument control systems, mostly under gfortran or ifort/ifx.

This is one of the few situations where we can see a practical use for COMMON blocks, but it is NOT modern Fortran. We have considered replacing the COMMON block addresses by pointers, and I don’t think that this would require a great deal of work in fpt. We can make fpt add the TARGET attributes very easily. The problem here is in accessing array elements. We would need to do C-style pointer arithmetic, and I haven’t tested this in Fortran. I would welcome advice on this issue.

Converting the COMMON blocks to modules is done automatically by fpt - see
http://simconglobal.com/fpt_ref_change_common_to_module.html
But note that you will see EQUIVALENCE statements in the output if a COMMON block is laid out differently in different routines.

septc · April 19, 2024, 4:42pm

Though not related to namelist, how about generating a symbol table for module variables of interest via some external tool (eg, suggested above) and making a helper script (eg, in Python) for auto-generating a Fortran routine that tries to read in the module variables from the command line input (in the interactive session of your program…?) The “-fdump-fortran-original” option in Gfortran might be handy for generating such a symbol table, for example…

!! mytest.f90
module mymod
    implicit none
    integer :: num, narr(3)
    real :: val, farr(3)
end module

$ gfortran -c -fdump-fortran-original mytest.f90

Namespace: A-Z: (UNKNOWN 0)
procedure name = mymod
  symtree: 'farr'        || symbol: 'farr'         
    type spec : (REAL 4)
    attributes: (VARIABLE IMPLICIT-SAVE DIMENSION)
    Array spec:(1 [0] AS_EXPLICIT 1 3 )
  symtree: 'mymod'       || symbol: 'mymod'        
    type spec : (UNKNOWN 0)
    attributes: (MODULE )
  symtree: 'narr'        || symbol: 'narr'         
    type spec : (INTEGER 4)
    attributes: (VARIABLE IMPLICIT-SAVE DIMENSION)
    Array spec:(1 [0] AS_EXPLICIT 1 3 )
  symtree: 'num'         || symbol: 'num'          
    type spec : (INTEGER 4)
    attributes: (VARIABLE IMPLICIT-SAVE)
  symtree: 'val'         || symbol: 'val'          
    type spec : (REAL 4)
    attributes: (VARIABLE IMPLICIT-SAVE)

Machalot · April 19, 2024, 9:14pm

I appreciate the suggestion. I’m really trying to get away from maintaining separate code inspection tools, or depending on extra compiler diagnostics outside the language standard. That is a big pain and source of errors in the current legacy workflow. People write creative code that tricks the tool and we get a bad dictionary. Or people just forget to update the dictionary in the midst of making changes, and we get errors that are hard to track down.

wspector · April 19, 2024, 11:32pm

Dispersion is the remedy for contention.

Perhaps rather than one large module with hundreds of variables in it, you could split the module into multiple modules - with related variables together in each module. You can then have a module which USEes all the above modules with a namelist statement and I/O routine for reading the desired variables. Other program units could USE just the modules they need.

rwmsu · April 20, 2024, 12:08pm

To me this is an example of why Fortran should have intrinsic dictionaries, lists etc. I find it hard to believe that these have not been an integral part of the language for at least a couple of decades now. The use cases for things like dictionaries far exceeds the cases used to justify a lot of what has been introduced in the language since F95. Just my humble opinion.

Machalot · April 22, 2024, 4:34pm

I do intend to use many separate modules that are all “used” by the command line module. The only obstacle is the inability to declare a single namelist group with statements across multiple modules. I think a centralized module namelist is workable, but it’s less than ideal because each variable still has to appear in two places.

wspector · April 22, 2024, 5:05pm

Each module scope variable is declared in only one spot. The namelist group would merely reference those variables. (Of course if not using implicit none, the presence of a variable name in a namelist statement could implicitly declare the variable - which may not be what was intended…)

Machalot · April 22, 2024, 5:45pm

I’m just saying the requirement to appear in two places (declare and add to namelist group in separate files) creates more maintenance and likelihood for error.

RonShepard · April 22, 2024, 9:27pm

If the programming convention is to always specify the ONLY clause on USE statements, then the variables would be declared in their home module, then referenced again, possibly with renaming, on the USE statement, and then again in the namelist statement. When I add a new module variable this way, there are then three places that I need to modify. At least the way that I use namelist, I personally don’t really see those three references as a maintenance problem.

Topic		Replies	Views
Resolving name conflicts of input variables with Fortran instrinsics Help	12	784	June 28, 2022
Cannot remove COMMON block, despite setting global values Help	3	492	December 28, 2021
Two modules with the same name in one program	15	1545	August 8, 2022
IMPLICIT usage and equivalence	3	368	October 21, 2021
C-interoperable global variables	1	526	September 3, 2021

Legacy Code Modernization: global common, equivalence out; modules, namelists in?

Related Topics