Can we reinstate EQUIVALENCE?

We have a code base of tens of millions of lines.
Just over 70% of the packages contain EQUIVALENCE statements.

The uses we identified are:

  • Organising data for communication between processes. One package used in financial control and analysis contained over 20,000 EQUIVALENCE statements mapping named variables into arrays (I was surprised so I checked!)
  • Organising COMMON blocks. Several aircraft simulations have between 2000 and 4000 EQUIVALENCE statements for this purpose.
  • Providing short local names for components of derived types and structures. For example EQUIVALENCE (pitch,attitude%pitch)
  • Building data structures for attached hardware, essentially re-arranging bytes. This could be done by other means, but as pointed out by @RonShepard and others this can be awkward.
  • Implementing paged data structures. This is important in our work. In explanation, fpt analyses code and stores data in multiple tables (Currently there are 27). All these tables are overlaid into the same large array. The array is divided into pages. When,for example, a new symbol table record is needed a counter is incremented. When this reaches the end of the current page a new page is used (not allocated) for continuation of the table. There are several advantages of this process. i. We don’t specify a maximum size for any table so no table can run out of space until we eventually run out of memory in the shared array. ii. As data from a large program grows, elements of the tables which describe it remain fairly close in memory. This probably helps cache handling. iii. Memory is not allocated by ALLOCATE constructs so we avoid a double reference to find, for example, a symbol table, cross-reference table or sub-program table reference. The 27 tables, and the token stream of the code are overlaid by EQUIVALENCE statements.

There must be a very good reason to remove EQUIVALENCE from the standard. However
I doubt if it will disappear from compilers or from widespread use. Can we reinstate it?

BTW my little search code is:

#!/bin/bash

for F in $(cat codes.txt);
do
  echo -n $F >> equiv.txt
  echo -n "   " >> equiv.txt
  cd $F
  grep -ir equivalence | wc -l >> ../equiv.txt
  cd ../
done

4 Likes

I certainly still use it. Much easier for emulating byte operations.
What compilers do not compile EQUIVALENCE correctly ?

Most of my codes still contain EQUIVALENCE and all build the last time I tried.

1 Like

EQUIVALENCE is tagged as obsolete, which means it is discouraged in new codes. But I am pretty sure it won’t be deleted soon.

I never use EQUIVALENCE because of I find it extremely confusing and an artifact of a by-gone age when memory was small and hard to come by. It makes debugging code a nightmare (at least for me). Also, isn’t it a form of “aliasing” like pointers and can prohibit vectorization. I also don’t find Fortran’s intrinsic bit-flipping routines to be awkward to use etc. Like say Fortran’s support for CHARACTER data/strings, it just takes a little more thought on how you use them than the equivalent functionality in other languages. Doesn’t mean you can’t do things that other languages can do. You just might have to write your own usually small functions to provide the missing features.

3 Likes

I am finding that ASSOCIATE can be used to replace these local name aliases. Of course, if there are 20,000 of these in a code, one would need an automatic tool to do the conversion.

I would like to know the answer to this too. In some sense, ASSOCIATE is tamer than EQUIVALENCE, or POINTER, so maybe the compiler can produce better code or be more aggressive with optimizations?

My memory is not what it once was but wasn’t it mentioned in a recent thread here that some compilers appear to implement ASSOCIATE like they were pointers. My issue with ASSOCIATE is it can get very unwieldy and cumbersome in a hurry if you have more than a handful of items you want to rename.

If you have many items to rename, equivalences will be cumbersome too

This is where I’ve seen the most usage of equivalence as well. One of the codes I occasionally work on has three massive arrays (>10,000 elements) that live in a common block. To “import” a variable to a local scope, each element of the arrays is locally equivalenced to a local variable name. This is a maintenance nightmare because:

  • The array index is a magic number:
equivalence( xa(4522), local_var_name1) ! how do I keep all these indices straight?
equivalence( xa(967),  local_var_name2) 
equivalence( xa(1884), local_var_name3(1)) ! this is an array of dimension 100
! ... 
equivalence( xa(8875), local_var_name4)
  • There’s no way to guarantee the same array index refers to the same variable in different scopes:
! in subroutine aerodynamics.f
equivalence( xa(4523), airspeed)
! in subroutine propulsion.f
equivalence( xa(4532), airspeed)  ! oops, transposed digits in the index
  • It’s very susceptible to array collisions, e.g. where local_array has more than one element, this is valid Fortran but wrong semantics:
equivalence( xa(4522), local_array(1,1)) ! local_array has dimension 3x3
equivalence( xa(4523), local_scalar) ! oops, clobbered the elements of local_array
  • It’s hard to make changes without breaking things. For example, if I need to add a new array with 200 elements (50 timepoints of a 3-element vector), how do I easily find an unused 200-element section of the global array?
double precision local_array(50,4)
equivalence( xa(9015), local_array(1,1)) ! xa(9015) through xa(9214) must not be used anywhere else

A little over a decade ago we used some clever python scripts to operate on another (similar) codebase to replace all common blocks with modules and got the output to match to full double precision. There were no equivalence statements to deal with in that other codebase, but I imagine a similar update could be made where all equivalence statements are replaced by

use module_name, only: local_var_name

And then you have no more magic numbers or array collisions to worry about.

5 Likes

Excellent analysis, thank you! Obvious missing functionality for me is:

  • equivalencing dummy arguments and local procedure arguments
  • equivalencing derived types (at least bind(C) or sequence types)

Plenty of tricks are available for dynamic casting, none for static casting like equivalence (in a very limited manner) allows.

EQUIVALENCE is not a static cast, its semantics is closer to that of union facility in the C standard.

C interoperable types with int32_t and int64_t and the use of memcpy from the C standard library is what Fortran practitioners may want to consider first for any actual uses cases in non-legacy codes.

1 Like

I remember one usage in code I had to maintain on the CDC systems at Imperial College. The code came from CERN. I had to make changes to the code to make it work on our versions of the CDC operating systems we used. Big penny dropping moment when I realised that the variable names in the equivalence and common blocks were related to a variety of European language names, e.g. clef is the French for key. One of our technical writers spoke several European languages, and she me helped create a translation sheet to help with me with English, French, Italian and German words for the same thing. Groups of subroutines and functions were obviously developed by people from several European countries.

1 Like

Compiled three f77 programs this weekend using gfortran 15.2.0 - all using EQUIVALENCE and all compiled clean. These are the flags I use gfortran -O2 -ffixed-line-length-256 -std=legacy -finit-local-zero -I. -o

I have found that people use the term cast in two different ways. One meaning is to convert a value from one type to another, like real(j) or int(x). The other meaning is to take the bits of one type and treat it as another type, like transfer(j,x). I try to avoid using that term for this reason.

3 Likes

My reasons for starting this thread are:

  • As @RonShepard and others have found, bit and byte manipulation without EQUIVALENCE can be fiddly and inelegant
  • We move COMMON blocks to include files or convert them to modules. Where the variables in a COMMON block are mapped differently in different routines we build a backbone array of INTEGER(int8) and nail the variables to it with EQUIVALENCE statements. Magic numbers, but they are all in one file and sequential so easy to maintain. BUT I find that we are generating code automatically which uses an obsolete feature. I don’t see another way, at least without incurring a maintenance problem and a performance hit.
  • We use paged data structures in our own code. To analyse a two million line code, populating 27 different tables, we have to set the bounds of the tables high enough that we won’t run out of anything. This would waste a great deal of memory that we may not have. Again, I can’t see how to do this without EQUIVALENCE (or re-writing the whole thing in C :kissing_face: ).

I do the same kind of thing in my legacy codes, but I have always avoided using integer*1 (the modern equivalent is integer(int8)), integer*2, and so on as the underlying backbone because many machines, both legacy and modern, have address restrictions. For example, if you happen to equivalence a real64 entity to an odd hardware address, then chaos can occur. The machine code either causes a run time error, or the compiler adds extra instructions to copy the bytes to a temporary location, do the operation, and then copy the bytes back. In the latter case, you suddenly see your run times increase by a factor of 2x or worse, but you might not know why. Modern machines have these restrictions too, e.g. an int64 or real64 entity must sometimes need to have a hardware address that is a multiple of 4 or 8.

My workaround for this has always been to declare/allocate the underlying base array as integer*8 or real*8. That way, the equivalence of the real*8 local to an element of that array will always satisfy any alignment restrictions of the hardware. The downside, of course, is that when you equivalence integer*2, integer*4, real*4, etc. local entities, you sometimes waste some bytes because of the now unnecessary alignment. Still, this requires a lot of magic numbers, and counting by the programmer to get things lined up correctly with equivalence.

These are all low-level hardware details that a high-level language should not need to worry about. So I can understand the desire within the language to eliminate equivalence. But at the same time, fortran programmers are always trying to solve tomorrow’s problems on yesterday’s hardware (a Numerical Recipes reference), so we are always faced with the practical problems that presents.

On the other hand, typical memory capacities have increased from KBs, to MBs, to GBs, and we are almost at TBs now even for personal computers. Of course, there are exceptions to this, such as embedded systems, but the general programming trend now is toward freely wasting memory resources that at one time were precious. So maybe the fortran trend, which includes the idea of eliminating equivalence, is just a foward-looking feature of that evolution. I can easily remember when text editors ran in 16KB or 32KB of memory; now MS Word requires some 2.5GB of memory, and just barely has more functionality. That is what they call progress these days.

@RonShepard
Your use of real*8 as a base reference works well and I have used this as well.
That was until you include real*10 arrays with Gfortran’s implementation of real*10 which breaks this approach in my memory manager. A bit short sighted by Gfortran.

I think the gfortran real*10 on intel/amd hardware has padding bytes, right? The storage_size() intrinsic reports 128 bits (rather than the 80 bits that are actually used). But I do not know what are the alignment restrictions? And are they the same on all generations of these cpus?

There is a really good programmer, mecej4 who taught me how to use equivalence on a interesting little problem in 1969 code. He knows far more tricks than I, but removing it would be like saying, let us take out the word but from the English language we can use the equivalent however, of course I only have to maintain old code, so I like it left alone.
If you try C# it changes every few months and it is a pain, but a necessary pain. Some of us remember Fortran 3.31 from Microsoft, one had very limited memory. So just leave the old to the old and let the young forget it exists.

1 Like

The reason we use INTEGER*1 is exactly to replicate the organisation of the original COMMON block, in all of its incarnations in different routines. In the codes where this matters, the COMMON blocks are shared between different programs, sometimes in different languages, and we have to preserve the addresses.