-fallow-argument-mismatch

You’re correct. I haven’t done the leg work to demonstrate that a platform or compiler actually does do it differently. However, if we’re just talking about standards conformance, I don’t need to show things actually being done differently, just explain how the standard allows for it to be done differently, making it undefined behavior and thus not standards conforming. If you’re fine with accepting the risk of going outside the standard, by all means do what works. I happily accept there are times this is the case and we must just do what we can, hopefully minimising the risk of things breaking later or being non-portable.

My main point was to demonstrate that in order to interpret memory in two different ways, you are necessarily going outside the standard. Given you don’t really have any other practical choice here, just be aware of the potential risks. IMO, the c_loc and c_f_pointer approach is the modern way of doing it, and you’re probably least likely to catch any grief from compilers for doing it, even with strong standards conformance checking on.

1 Like

The answer is of course “no”, and we know exactly why that is the case. On big-endian machines a(1) will be the high-order bits of b(1), and a(2) will be the low-order bits. On little-endian machines, a(1) will be the low-order bits and a(2) will be the high-order bits.

TRANSFER() between a(:) and b(:) result in the same ambiguity, and as far as I can see in practice, behaves the same way as the old-fashioned equivalence behaves regarding byte order.

So yes, this is “undefined” behavior as far as the fortran standard is concerned, but in practice there are only going to be two possible outcomes resulting from the two possible byte ordering conventions. In principle the standard also allows for other byte ordering conventions or for implementations on machines that are word addressable and do not have bytes at all.

Before f90, the standard explicitly allowed different types to be equivalenced together, but it did not define exactly how the bits within those types aligned. This was a practical choice because in those days there were plenty of machines that used both byte addressing conventions, perhaps even with big-endian machines being the majority. Over time, that switched, and today, little-endian machines are the majority. There were also word addressable machines with 36-bit words, 60-bit words, and 64-bit words that did not address bytes at all. Instead, characters (6-bit, 7-bit, and 8-bit) were packed into those words.

Since f90, the standard seems to be moving away from allowing equivalence, to the current point where it is obsolete. However, TRANSFER() seems to be its replacement, which has the exact same ambiguities regarding bit alignments and byte orders. Some code was recently posted comparing the use of equivalence to TRANSFER(), and the exact same machine instructions resulted from both sets of code.

@kargl
Very selective !
How about “If the equivalenced objects have different types or kinds, the EQUIVALENCE statement
does not cause any type conversion or imply mathematical equivalence.”
Surely this is a concession that “different types or kinds” can be expected in an EQUIVALENCE.

I use this often, especially to easily use INTEGER*1 as an unsigned byte, for convenience of UTF-8 character manipulation.

It is ridiculous that the modern Fortran committee can ban long used coding approaches, claiming to protect us from error prone approaches, with no conclusive proof. All Fortran coding requires care, rather than treating us as fools.

For this particular case, we have never been provided with an unsigned byte intrinsic type.

@JohnCampbell ,

Your complaint, “we have never been provided with an unsigned byte intrinsic type”, is most valid. The practitioners of Fortran have a genuine need for such an intrinsic type. The standard committee for Fortran for decades have been remiss on certain aspects and this is one such.

But the rest of your post makes no sense.

F2018:C8110(R871) If an equivalence-object is default integer, default real, double precision real, default complex, default logical, or of numeric sequence type, all of the objects in the equivalence set shall be of these types and kinds.

F2018:C8113(R871) If an equivalence-object is of an intrinsic type but is not default integer, default real, double precision real, default complex, default logical, or default character, all of the objects in the equivalence set shall be of the same type with the same kind type parameter value.

To me, it is a bit inconsistent. double precision has not been a separate type since F90, now treated as a kind of real. So now we have:

  use iso_fortran_env, only : int32, int64, real32, real64
  integer(int32) :: i(2)
  integer(int64) :: l
  real(real32) :: r(2)
  real(real64) :: d
  equivalence (l, i(1))   ! non-conforming
  equivalence (d, r(1))   ! conforming

I can hardly understand why equivalencing two 32bit reals with one 64bit real is OK but not two 32bit integers with one 64bit integer. If the reason for C8113 constraint are possible problems with memory alignment, they should be the same in both cases.

It is allowed to equivalence the real32 array to the real64 variable only when real32 is the default real and when real64 is the same as double precision. Those were the cases specifically allowed before f90, so they remain allowed now because of backwards compatibility. If, for example, real64 were the default real (and real128 the same as double precision), then the real32/real64 equivalence would not be covered by the backwards compatibility exception.

Before f90, there was only one integer kind specified by the standard, so that default integer type is the only one allowed to be equivalenced to the other types. But still, the fortran standard does not require that default integer kind to be int32, it could be something else, as long as it is storage-unit consistent with the default real, logical, complex, and double precision.

Not really. In Fortran 77, the only restriction regarding the types in EQUIVALENCE statement was that CHARACTER entities were allowed to appear only with other CHARACTER entities. So, default real, complex, logical, integer and double precision types were not specifically allowed. They were the only non-CHARACTER types available. So it would be equally true to say that backwards compatibility requires that all non-CHARACTER type are allowed in equivalence.

Relevant excerpts from F77 standard:

8.2 EQUIVALENCE_Statement

An EQUIVALENCE statement is used to specify the sharing of storage units by two or more entities in a program unit. This causes association of the entities that share the storage units.

If the equivalenced entities are of different data types, the EQUIVALENCE statement does not cause type conversion or imply mathematical equivalence.
[…]
8.2.3 Equivalence of Character Entities. An entity of type character may be equivalenced only with other entities of type character. The lengths of the equivalenced entities are not required to be the same.

The standard committee could have done that, but they didn’t. They could have extended equivalence in a very general and open-ended way to all kinds of all variables (perhaps excluding mixing characters and non characters), but they instead chose to allow only those types that were defined by f77 and earlier. As a practical matter, most pre-f90 compilers did allow a wider variety of types as extensions (integer*1, integer*2, integer*8, real*16, complex*16, logical*1, logical*2, logical*8 just to list a few of the common ones). The fortran standard could have specified how those entities could have been equivalenced too, but they didn’t. They instead chose a restrictive and limited specification within the standard, and then when f90 was specified, which introduced the general KIND approach, they continued on with that restrictive specification rather than an open and general specification.

There was one important generalization that occurred, but I forget exactly when. It might have been f90, or maybe f95. That was allowing character entities to be storage associated with noncharacter entities. In f77 for example, it was not allowed to put both character and noncharacter entities into a common block. That restriction was removed. However, I don’t think it included also equivalencing of character and non characters, I think it was only for common block storage association. Even at the time that seemed inconsistent, and it still does in hindsight. I have no idea why they did that.

I fully agree. And the only reason why they did it was IMHO trying to prove that equivalence is useless and should be removed form the language. For intrinsic types of all kinds (possible exception being mixing character and non-character entities) I cannot see other reason. The storage association (which actually is created by equivalence) is still valid in Fortran and the memory alignment requirements could be fulfilled by placing all objects appearing in equivalence statements starting at memory addresses suitable for the most restricted type.

For the last 40 years, I have not been aware of these restrictions the standard has placed on the use of EQUIVALENCE.
To me it has always been a way for either:
a) recasting a memory address to review in a different way, or
b) enabling management of COMMON variables for “nefarious” purposes.

Equivalencing a 1-byte integer to a default integer or a default integer memory address (derived from non-standard LOC) to an 8-byte integer, to overcome the sign problem is a common approach I use.
The simplicity of equivalence is it is defined at the declaration stage, as apposed to TRANSFER which must be continually applied.

That this should be deemed excluded from Fortran demonstrates a lack of consideration of existing usage.
I have never seen a Fortran compiler report this restriction. Does NAG Fortran report this ?
Another example of unnecessary exclusion of typical usage.
I wonder how many other Fortran users are unaware of these unnecessary rules for equivalence.

Interesting problem. I have often wanted an option to have arrays (both allocate and automatic) to be aligned at the start of a memory page. This could greatly assist to minimise memory coherence problems between threads.
Imagine how J3 might address this issue. Not only one byte, but a Roman legion of bytes to ignore !

I’m not sure how much different those two things are these days. In the old days, the EQUIVALENCE variables were either local or in a common block. In particular dummy arguments were not allowed. The local and common variables were in static memory, allocated at compile time. f77 changed that a little by allowing them to be allocated on the stack, and it introduced SAVE for those cases where static allocation was required. But modern fortran now defaults to RECURSIVE for all subprograms, which more or less requires that local variables be allocated on the stack, of course with the exception of explicit and implicit SAVEd variables. So now the language will allocate anew the declared variables on the stack at runtime. TRANSFER is a little more general than EQUIVALENCE in that it allows any kind of argument, local, dummy, module, common block, derived type element, etc. But if it is using a local variable, then that would typically be a local variable on the stack, the same as if that local variable were being used in an EQUIVALENCE. In a different thread recently machine code from gfortran was posted for two cases (https://godbolt.org), one with equivalence and the other with transfer, and the instructions were exactly the same. Of course, that was just one compiler, but I expect that to be a general trend among compilers now that local variables are stack allocated. Also, for both EQUIVALENCE and TRANSFER, the memory locations on the stack (or wherever) may never actually be referenced. Depending on how the variables are used in the surrounding code, the optimization level, and so on, the transfer of bits may only occur within registers. Certainly in that case, there would be no effective difference between EQUIVALENCE and TRANSFER.

program equiv
   use iso_fortran_env, only: int8, int16, int32, int64
   integer( int8) ::  i8
   integer(int16) :: i16
   integer(int32) :: i32
   integer(int64) :: i64
   equivalence(i8,i16,i32,i64)
   character(*), parameter :: cfmt='(i18,b65.64)'
   i64 = -1
   i8  = 1
   write(*,cfmt)  i8,  i8
   write(*,cfmt) i16, i16
   write(*,cfmt) i32, i32
   write(*,cfmt) i64, i64
end program equiv

This compiles with gfortran with the default options.

$ gfortran equiv.f90 && a.out
                 1 0000000000000000000000000000000000000000000000000000000000000001
              -255 0000000000000000000000000000000000000000000000001111111100000001
              -255 0000000000000000000000000000000011111111111111111111111100000001
              -255 1111111111111111111111111111111111111111111111111111111100000001

With NAG, it results in

$ nagfor equiv.f90 && a.out
NAG Fortran Compiler Release 7.1(Hanzomon) Build 7114
Error: equiv.f90, line 7: EQUIVALENCE of two different non-default intrinsic types
Error: equiv.f90, line 7: EQUIVALENCE of default numeric with non-default intrinsic type
Error: equiv.f90, line 7: EQUIVALENCE of two different non-default intrinsic types
Errors in declarations, no further processing for EQUIV

This is on MacOS with arm64 hardware, which has little-endian addressing and twos-complement integers.

@RonShepard

Thanks for this example.
NAG does diligently follow the standard, but what a stupid rule for EQUIVALENCE.

As discussed previously, the only reason for this rule can be to encourage TRANSFER to be used, where for most of us EQUIVALENCE performs adequately.

We have new users of Fortran complaining about EQUIVALENCE, but they do not bother to understand what it is doing. This is no reason to remove something that functions as always intended.

I do not have access right now to a big-endian machine, but if I did, then this is what I think the output would be for that program:

                 1 0000000000000000000000000000000000000000000000000000000000000001
               511 0000000000000000000000000000000000000000000000000000000111111111
          33554431 0000000000000000000000000000000000000001111111111111111111111111
144115188075855871 0000000111111111111111111111111111111111111111111111111111111111

As you can see, except for the trivial first line, both the bit patterns and the integer values are all different from before. So what exactly should the language standard say about the result of the equivalence? And on a sign-magnitude or a ones-complement integer machine, at least the little-endian bit patterns would have different values too.

I have always thought that the right answer to all of this uncertainty is to stop using signed integers to store bit strings. The language should have just a bit string type, with well defined operators to work on those bits. The conversion between those bit strings and integer (or real, or whatever) values would of course be machine dependent, but at least the bit strings themselves would be on a solid foundation and would be portable among all compilers.

I would say there are several differences, but whether those differences are advantages or disadvantages depends on the specific case. As you noted, contained procedures always have an explicit interface, so that prevents the programmer from making many mistakes, such as the wrong number, wrong rank, wrong type, or wrong kind of arguments. Except there are those cases where the programmer wants to type pun an argument, and then it is s nuisance feature that needs some possibly nonstandard workaround.

Other differences involve host association. A contained subprogram has access to variables from its host. An external subprogram has access only through its dummy argument list. Sometimes that is a really nice feature. I have one program that comes to mind where I have nested do-loops. The work done within each do loop has some structure, but when they are written as nested do loops, that structure is not clear because of the amount of code in between the beginning and the end of the do loops. So I ended up putting each loop within its own contained procedure. That reshuffles the code around so that the statements at the beginning of each loop are close to those at the end, with a single call statement in the middle to the next inner loop. There must be dozens of other situations like this that benefit from this feature of contained procedures.

Before f90, the only way to simulate the situation where different routines share local data was to use entry points. Now, they can all be separate module routines that share module data. That is much nicer than the old way, in many different ways. For example, one entry point cannot call another entry point, but one contained procedure can call another contained procedure. Also dummy argument intent() can be specified more clearly. A dummy argument that was effectively an intent(in) argument for one entry point and intent(out) for another could not be declared so; now with contained procedures, it can.

gfortran applies the GNU extension by default.

Users can try with, say, -std=f2018 option to get the compiler to enforce standard compliance and thereby get it to detect and report the nonstandard usage with the EQUIVALENCE statement.

It was as recently as Fortran 2018 that the standard labeled COMMON and EQUIVALENCE as obsolescent. If someone knows how to get the standards body to reverse the decision with compelling arguments, then they can try. But otherwise, it is done with and the practitioners can move on.

In the context of this thread, to reiterate OP has options within the standard language itself to achieve the use case stated in the original post without having to resort to any nonstandard extensions with EQUIVALENCE and cut out further discourse.

OP can even consider something like the following which tries to achieve the need expressed in the original post, “To merge the sub records, I am converting the buffer “rec_data”, which is provided as 4-byte (default) integers to 1-byte integer vector so the sub-records can be easily merged.”:

module data_m
   use, intrinsic :: iso_c_binding, only : B1 => c_int8_t, B4 => c_int32_t, c_loc,      &
                                           c_f_pointer
   private
   integer(B4), allocatable, target, save :: buffer(:)  
   integer(B1), pointer, save :: sub_records(:)
   public :: load_data, consume_data
contains
   subroutine load_data()
      ! Simulate here the reading of the data from source, file or a database, etc. as
      ! a naive allocation
      buffer = [ int( b"10101010101010101010101010101010", kind=kind(buffer) ), &
                 int( b"10101010101010101010101010101010", kind=kind(buffer) ), &
                 int( b"10101010101010101010101010101010", kind=kind(buffer) ) ]
      call c_f_pointer( cptr=c_loc(buffer), fptr=sub_records, &
         shape=[ size(buffer)*storage_size(1_b4)/storage_size(1_b1) ] )
   end subroutine
   subroutine consume_data()
      print "(*(b8:,'; '))", sub_records
   end subroutine 
end module
   use data_m
   call load_data()
   call consume_data()
end 
C:\temp>gfortran p.f90 -o p.exe

C:\temp>p.exe
10101010; 10101010; 10101010; 10101010;
10101010; 10101010; 10101010; 10101010;
10101010; 10101010; 10101010; 10101010

C:\temp>

If EQUIVALENCE is replaced with TRANSFER, then everything is standard and it compiles on nagfor and gfortran and any other compiler. Some compilers will warn that there are undefined bits in the output of the function. The output is different from the nonstandard EQUIVALENCE case because in the transfer, only the “leading” bits are set, and the standard sidesteps the meaning of “leading” in this context. On a little-endian computer, the “leading” bits are the low-order bits, and on a big-endian computer they will be the high-order bits. Thus TRANSFER has the same kind of unspecified behavior as EQUIVALENCE in this case, but without setting the other bits in the longer output that EQUIVALENCE does set in its nonstandard way.

program equiv
   use iso_fortran_env, only: int8, int16, int32, int64
   integer( int8) ::  i8
   integer(int16) :: i16
   integer(int32) :: i32
   integer(int64) :: i64
!   equivalence(i8,i16,i32,i64)
   character(*), parameter :: cfmt='(i18,b65.64)'
   i8  = 1
   write(*,cfmt)  i8,  i8
   i16 = -1
   i16 = transfer( i8, i16 )
   write(*,cfmt) i16, i16
   i32 = -1
   i32 = transfer( i8, i32 )
   write(*,cfmt) i32, i32
   i64 = -1
   i64 = transfer( i8, i64 )
   write(*,cfmt) i64, i64
end program equiv

The gfortran output is:

$ gfortran equiv.f90 && a.out
                 1 0000000000000000000000000000000000000000000000000000000000000001
                 1 0000000000000000000000000000000000000000000000000000000000000001
                 1 0000000000000000000000000000000000000000000000000000000000000001
        4376387585 0000000000000000000000000000000100000100110110100110000000000001

Only the low-order 8 bits are set by TRANSFER(). The higher-order bits are modified, but undefined by the standard. On a big-endian machine, it would be the high-order bits that are set, and the low-order bits would be undefined. The only thing that is really defined by the standard is that if TRANSFER() is used to extract the bits with an int8 mold, it would result in the same bits that were originally defined.

Now in the context of working with large data like with OP or the use cases of yore with the legacy nonstandard codebases that used dialects of FORTRAN 77 (and 66) which usually then lead to shared data (MODULE entitys or COMMON blocks), note the message implied in some posts here as TRANSFER being equivalent (bad pun intended) to EQUIVALENCE is inaccurate.

This is because EQUIVALENCE semantics pointed Fortran processors toward shared memory and a rough equivalent of union data type kind of semantics in C and its offshoots. I don’t think this is possible with TRANSFER. Consider a silly example:

module m
   use, intrinsic :: iso_c_binding, only : c_intptr_t
   private
   integer, parameter :: n = 2
   integer, save :: a(2*n) = 0
   double precision, save :: x(n)
   equivalence ( a, x )
   public :: set_x, consume_x
contains
   subroutine set_x( vals )
      double precision, intent(in) :: vals(:)
      x = vals
   end subroutine 
   subroutine consume_x()
      integer(c_intptr_t) :: add_a, add_x
      add_a = transfer( source=loc(a), mold=add_a )
      add_x = transfer( source=loc(x), mold=add_x )
      print *, "m::x = ", x
      print "(g0,z0)", "Address of data store 'a' (in hex): ", add_a
      print "(g0,z0)", "Address of data store 'x' (in hex): ", add_x
   end subroutine 
end module
   use m
   call set_x( [ 1D0, 2D0 ] )
   call consume_x()
end 

With gfortran, see how the EQUIVALENCE data objects share the same memory address:

C:\temp>gfortran p.f90 -o p.exe

C:\temp>p.exe
 m::x =    1.0000000000000000        2.0000000000000000
Address of data store 'a' (in hex): 7FF702EAD040
Address of data store 'x' (in hex): 7FF702EAD040

C:\temp>

The equivalent of the above with TRANSFER shall be:

module m
   use, intrinsic :: iso_c_binding, only : c_intptr_t
   private
   integer, parameter :: n = 2
   integer, save :: a(2*n) = 0
   double precision, save :: x(n)
   public :: init_x, set_x, consume_x
contains
   subroutine init_x()
      x = transfer( source=a, mold=x )
   end subroutine 
   subroutine set_x( vals )
      double precision, intent(in) :: vals(:)
      x = vals
   end subroutine 
   subroutine consume_x()
      integer(c_intptr_t) :: add_a, add_x
      add_a = transfer( source=loc(a), mold=add_a )
      add_x = transfer( source=loc(x), mold=add_x )
      print *, "m::x = ", x
      print "(g0,z0)", "Address of data store 'a' (in hex): ", add_a
      print "(g0,z0)", "Address of data store 'x' (in hex): ", add_x
   end subroutine 
end module
   use m
   call init_x()
   call set_x( [ 1D0, 2D0 ] )
   call consume_x()
end 

which with gfortran will give:

C:\temp>gfortran p.f90 -o p.exe

C:\temp>p.exe
 m::x =    1.0000000000000000        2.0000000000000000
Address of data store 'a' (in hex): 7FF692D4E040
Address of data store 'x' (in hex): 7FF692D4E050

C:\temp>