Aliasing issue with zero-length array

ivanpribec · December 29, 2024, 8:23pm

I am refactoring a F77 code for solving a system of second order PDEs along with coupled ODEs (if any). While testing, I encountered an aliasing bug that goes unnoticed even when warning and instrumentation flags are turned on.

Assume the global vector of unknowns is stored in an array U(NEQN) where

NEQN = NPDE*NPTS + NV, and
NPDE is the number of PDE variables
NPTS is the number of grid points for the PDE semi-discretization
NV is the number of ODE variables (a non-negative number)
the elements U(1:NPDE*NPTS) contain the values of the semi-discretized PDE variables
the elements U(NPDE*NPTS+1:NEQN) contain the values of the ODE variables

Deep in the solver, a procedure CSET is invoked as follows

      NVST = NPDE*NPTS + 1
C ... omitted
      IV = NPDE*NPTS
      IF (NV.GT.0) THEN
         IV = NVST
C ...    omitted
      END IF
      CALL CSET(NPDE,NPTS,U,WK(I6),WK,WK(I2),WK(I5),NEL,NPTL,WK(I4),
     *          WK(I12),XBK,IBK,WK(I3),U(IV),NV)

The procedure CSET is defined as follows:

      SUBROUTINE CSET(NPDE,NPTS,U,X,OMEGA,DU,XBK,NEL,NPTL,XC,CCR,XBH,
     *                IBK,DUTEM,V,NV)
C     .. Scalar Arguments ..
      INTEGER         IBK, NEL, NPDE, NPTL, NPTS, NV
C     .. Array Arguments ..
      DOUBLE PRECISION CCR(NPTL), DU(NPTL,NPTL), DUTEM(NPTL,NPTL),
     *                OMEGA(NPTL,NPTL), U(NPDE,NPTS), V(1), X(NPTS),
     *                XBH(IBK), XBK(IBK), XC(NPTL)

Within CSET, the PDE variables are associated to dummy argument U(NPDE,NPTS) for 2-d indexing convenience. The ODE variables are passed via the penultimate argument, declared as V(1), and are associated to the actual argument U(IV).

(In codes of this era, it was common to declare an array as length 1, even if the actual array argument was longer. I never understood why programmers didn’t always use the assumed-size (*) syntax in these cases. Was this just a bad habit?)

I spotted two unwanted issues here:

The dummy argument V(1) should be declared either as V(*) or V(NV).
Due to how IV is assigned, when NV = 0, there will be aliasing of V(1) with the element U(NPDE,NPTS).

I’d like to amend this code in way which allows automatic bound-checking by changing the dummy argument declaration to

       DOUBLE PRECISION ..., V(NV), ...

However to get rid of the aliasing for the NV = 0 case I will also need to modify the calling code to pass a dummy variable as actual argument:

      DOUBLE PRECISION VDUM(1)
C ... 
      IF (NV > 0) THEN
         CALL CSET(...,U(IV),NV)
      ELSE
         CALL CSET(...,VDUM,NV)
      END IF

Does this sound like the correct approach to fix both issues?

This also made me wonder, what (if anything) did Fortran 77 say about array dummy arguments, when the size is zero? Would it be legal to reference an array of length-zero in any kind of context? Obviously, there is not much one can do with such an array.

      subroutine zerolen(nv,v)
      integer nv
      double precision v(nv)
      print *, nv
      if (nv > 0) print *, v
      end

C main program
      external zerolen
      double precision v(5)
      nv = 0
      call zerolen(nv,v) ! is nv = 0 allowed?
      end

urbanjost · December 29, 2024, 9:23pm

No arrays with less than two elements, no character strings of zero length. The arr(1) convention was an extension that existed before Fortran 77. It could have a special meaning because you could not have an array of size-1. As it was typically just a starting address and still worked as an extension till f90 it was rarely changed to asterisk and is still in a lot of older codes

Now has the confusing issue as to whether to treat it as arr(*) or liternally, as you can now have an array with one element

. Trivia : in Fortran 66 you used to often see ARR=VALUE, which meant (on platforms that supported it) to set the first element to the value. Now with array syntax it means, of course, to set all elements to the value.

The fact you could not have a zero or one-element array helped contribute widely to the now-troublesome places where code is called with a scalar value when the dummy argument is actually an array. I know of no old compiler that did not work with that, but it was never “standard” although in reality it was more “standard” than the standard was at the time

. Newer compilers in particular that do not descend from a pre-f90 ancestor often do not implement that in a compatible manner with old code, leading to a lot of effort when the argument is not intent(in).

ivanpribec · December 29, 2024, 9:42pm

In this case, both u and v are used as intent(inout). I happened to notice the bug by accident, because in another subroutine (not shown or mentioned above) the statements,

u(1:neqn) = 1.0
v(1) = 0.0          ! oops, aliasing...

got executed (with nv = 0), and the surprising result was that u(neqn) became 0.

I realized that, especially in connection to bounds checking, because v(1) now gives a false sense of “okayness” - literally one element, which goes unnoticed if v is only ever used as a dummy argument, and the array dimensions are declared inconsistently across procedures.

Thanks for the explanation. I didn’t know there was a two-element array rule. It does seem to fit with the old F66 trip count rules (a topic of this discussion: Poll: refactoring a chunk of legacy code).

ivanpribec · December 30, 2024, 1:18am

Do you know which compilers/vendors had this rule? Or was this an array descriptor mechanism?

I looked into the ANSI USA Standard Fortran specification (pdf, 10.4 MB), colloquially known as FORTRAN 66, but I couldn’t find this rule. What it does say,

7.2.1.1 Array-Declarator. An array declarator specifies an array used in a program unit.

The array declarator indicates the symbolic name, the number of dimensions (one, two, three), adn the size of each of the dimensions. The array declarator form may be in a type-statement, DIMENSION, or COMMON statement.

An array declarator has the form v(i) where:

v called the declarator name, is a symbolic name.

(i), called the declarator subscript, is composed of 1, 2, or 3 expressions ,each of which may be an integer constant or an integer variables name. Each expression is separated by a comma from its successor if there are more than one of them. In the case where i contains no integer variable, i is called the constant declarator subscript.

The appearence of a declarator subscript in a declarator statement serves to inform the processor that the declarator name is an array name. The number of subscript expressions specified for the array indicates its dimensionality. The magnitude of the values given for the subscript expressions indicates the maximum value that the subscript may attain in any array element name.

No array element name may contain a subscript that, during the execution of the executable program, assumes a value less than one or larger than the maximum length [emphasis added] specified in the array declarator.

An array can have adjustable dimensions in a procedure subprogram, meaning that both the array and the integer variable name representing the dimension appear in the dummy argument list.

I also looked at a FORTRAN IV manual from DEC (pdf, 5.3 mb), and it says:

Each array specification gives the array identifier and the minimum and maximum values which each of its subscripts may assume in the following form:
      identifier(min/max,min/max,...,min/max)
The minima and maximum must be integers. The minimum must not exceed the maximum. For example , the statement
     DIMENSION EDGE(-1/1,4/8)
specifies EDGE to be a two-dimensional array whose first subscript may vary from -1 to 1 include, and the second from 4 to 8 inclusive.

Minimum values of 1 may be omitted. […]

I didn’t know that custom array bounds were already a thing, even before the ANSI Standard Fortran. Neither of the manuals mention the asterisk, so I’m guessing that was a F77 feature? Wikipedia mentions an anecdote about a paper from Walt Brainerd titled “Astrology versus Gastroenterology” about the choice of assumed-length symbol, (*) or (:). Ultimately, we ended up with both, and nowadays we even have the nostrils or snout, (..), for assumed rank.

RonShepard · December 30, 2024, 4:28am

I think this was a bug just waiting to happen in the original code. When aliasing occurs (when NV<=0), neither of the aliased dummy arguments can be modified within the subroutine, a condition that was certainly violated in the code shown.

F77 did not allow zero-length arrays, but it did allow lower bounds other than 1 using the colon notation that persists until now in modern fortran. In pre-f77 code, this sometimes resulted in redundant looking code to assign two integer values, one for the dimension (required to be a positive value) and the other for the “actual” array dimension used in the algorithm. F77 introduced the (*) notation to eliminate the need for that first integer entity. As far as I know, arrays of length one were allowed in f77 and even before f77. F90 allowed zero-length arrays, both with declared bounds and with assumed shape arrays (:); in the latter case the bounds are always mapped to (1:0) regardless of the original bounds declarations.

urbanjost · December 30, 2024, 10:34am

I never remember a dimension of one not being an early attempt at assumed size arrays, which should be replaced with an asterisk at a minimum, and scalars were passed as arguments where arrays were required with no regard everywhere so maybe an actual dimension a(1) was not illegal. Since there was no standard till f77 things varied a lot between platforms. Zero dimensions was definitely not allowed as noted above. Whether it was standard or not values were passed with little regard to being an array or scalar and assumed to act as an address to a starting point and if anything did bounds checking I do not remember it. Many codes assumed all values were saved, some codes assumed all variables were initialized to zero, some codes assumed order declared was order stored and overindexing to get a good vector length on a Cray for things like initialing array was common etc. So maybe a dimension of one was legal, after all; but I never remember seeing it used accept to indication an unknown upper bound, and I have seen a lot of old codes. A lot of old code does not have an explicit interface so replacing (1) with (L) will often introduce new bugs, so (*) is safer although technically obsolete itself.

ivanpribec · December 30, 2024, 1:16pm

Indeed. Many codes on Netlib share this problem and inspection of the documentation (if any) or the procedure statements to infer the expected subscript range is needed to judge what is (L)ength supposed to be.

I think the typical example is in the ODE integrators, where you can provide either a scalar absolute tolerance, or a vector. Since these are typically intent(in) you can usually fix this today with an array expression call odeslv(...,[atol],iopt). An example is shown here: Has anyone used DVODE and is it good? - #27 by CRquantum

This code was published in 1991, but probably it originated a decade earlier, around 1980 or so. In the accompanying articles, the authors mention running it on an Amdahl 5850. I found a digital copy of an Amdahl FORTRAN 77 Reference Manual, and in Section 3.4 it describes assumed-size semantics. In Section 10.7 it also says,

If an actual argument is an array element name, that element is associated with the initial dummy array element. In this case, the size declared for the dummy array argument must satisfy the formula size <= (ds+1-as)*(e1) where size is the storage required by the dummy array argument, ds is the maximum subscript value that an actual array element can attain, as is the specified subscript value for the actual array element, and e1 is the storage size occupied by one actual array element.

The array in the subroutine must not attempt to access a subscript value that beyond the limit of the array in the invoking program (see chapter 3, “Dummy Array and Actual Array” under paragraph 3.4). […]

According to these rules, the actual argument V(IV) would amount to a 1-element array. The authors of the code knew this, because in a manual, they say

Note that in the case when there are no ODEs coupled to the PDEs, V and VDOT will be dummy vectors of length one.

but they stop short of saying that the element V(1) should not be referenced when there are no ODEs (NV = 0).

Does that mean with Fortran 90, I can correct the code like shown below?

      IV = NPDE*NPTS
      CALL CSET(NPDE,NPTS,U,WK(I6),WK,WK(I2),WK(I5),NEL,NPTL,WK(I4),
     *          WK(I12),XBK,IBK,WK(I3),U(IV+1:IV+NV),NV)

This way when NV=0, the slice U(IV+1:IV+NV) would be a zero-length array?

EDIT: running a quick check on the program below,

      subroutine zerolen(nv,v)
      integer nv
      double precision v(nv)
      print *, nv
      v(1) = 3.0
      end

C main program
      external zerolen
      double precision v(6)

C Fortran 90 Solution
      nv = 0
      call zerolen(nv,v(5+1:5+nv))
      end

compiled with -fcheck=bounds shows the output I wanted to achieve:

At line 5 of file /app/example.f90
Fortran runtime error: Index '1' of dimension 1 of array 'v' above upper bound of 0

RonShepard · December 30, 2024, 6:18pm

Yes, I think that should work with an f90 or later compiler, but that possibility was not allowed in f77 or earlier due to the zero-length restriction. Of course, you are not allowed to reference any elements of the dummy argument of that zero-length array, something that might be occurring in the posted code. I believe that with an f90 compiler the elements of U(:,:) can be modified within the subroutine in this case (because no aliasing is occurring for the dummy arguments). Maybe someone else could comment on this aspect of the problem.

One other possible issue occurred to me besides the zero-length array when NV==0. Does U(IV+1:IV+NV) result in an array bounds violation when its declared bounds are U(1:IV)? I think not, but I’m not certain. As I stated before, the effective bounds of that strided zero-length array section are (1:0), regardless of the triple values in the expression that results in that zero length, and I think that is what counts for the array bounds. Maybe others could comment on this too.

With an f77 compiler, not only should V(1) (or any other element) not be referenced, but the elements of the U(:,:) array should not be modified. The U(:,:) elements could be referenced, but not modified. I believe this restriction was for all the elements of the U(:,:) array, not just the last element of the array which is actually aliased in this situation. This restriction was to allow local copies of those dummy arguments to be made, for example within vector registers, and the result would always be the same regardless of the order that the local copies were returned to the actual arguments. The aliased arguments were effectively intent(in), even though that attribute did not exist in f77, and it is/was the responsibility of the programmer to enforce this within the subroutine with no help from the compiler.

Topic		Replies	Views
Program to numerically solve 3 ODE's Homework	34	2235	December 1, 2021
Problem in initialization of large arrays Help	1	167	August 15, 2024
Simple elemental sequence association?	4	718	May 7, 2022
Surprising results with aliasing	2	331	April 3, 2024
Fortran Best Practice Minibook Announcements	84	4474	September 20, 2021

Aliasing issue with zero-length array

Related topics