Assigning a derived type variable to itself

Dear all,

I would like to know if anyone knows

  • How should we handle the assignment of a derived type variable to itself when the assignment operator for the derived type is overloaded in a subroutine?
  • How is such a situation mentioned in the Fortran Standard?

Such a situation arises when extending stdlib_sorting to support sorting arrays of bitset_large type. The procedure assign_large overloading assignment operator for the bitset_large type is as follows:

    pure module subroutine assign_large( set1, set2 )
!     Used to define assignment for bitset_large
        type(bitset_large), intent(out) :: set1
        type(bitset_large), intent(in)  :: set2

        set1 % num_bits = set2 % num_bits
        allocate( set1 % blocks( size( set2 % blocks, kind=bits_kind ) ) )
        set1 % blocks(:) = set2 % blocks(:)

    end subroutine assign_large

and when the same array element is put on both sides of the assignment operator, an error occurs with the message Fortran runtime error: Allocatable argument 'set2' is not allocated. Here, the stdlib is built with gfortran 11.2.0 on Windows 11 and set2 is defined as the variable on the right-hand side of the assignment operator. The error may be due to the deallocation of the component blocks of set2 caused by the intent(out) attribute for set1 since set1 and set2 are the same variable.

I am trying to resolve this problem by removing assign_large and using Fortran’s intrinsic assignment operation.

For more details, please see the issue #726 and pull-requests #723 and #727.

Thank you.

1 Like

Using the intrinsic Fortran assignment without overloading should do, as the derived type doesn’t contain any pointer. However the question still holds for cases where overloading is required (whatever the reason why it is required).

a = a with overloading means that the same variable is passed as the two actual argument of the subroutine, in other words the two dummy arguments get aliased. Argument aliasing is normally not authorized: are there some exceptions?

1 Like

@tomohirodegawa ,

As mentioned at the stdlib issue thread, argument aliasing is something the author is disallowed and it makes the program nonconforming while the processor is not required to detect and report the nonconformance. Nonetheless, in this case, the processor appears rather kind to do so - is that being overruled?

module m
   type :: t
      integer :: n = 0
   end type
   interface assignment(=)
      module procedure assign_t
   end interface
contains
   subroutine assign_t( lhs, rhs )
      type(t), intent(out) :: lhs
      type(t), intent(in)  :: rhs
      lhs%n = rhs%n
   end subroutine 
end module
   use m
   type(t) :: a
   a%n = 42
   a = a
end
C:\temp>gfortran -c -ffree-form -Wall -Wextra -Wimplicit-interface -fPIC -g -fcheck=all -fbacktrace p.f
p.f:18:3:

   18 |    a = a
      |   1
Warning: Same actual argument associated with INTENT(OUT) argument 'lhs' and INTENT(IN) argument 'rhs' at (1)
p.f:18:3:

   18 |    a = a
      |   1
Warning: Same actual argument associated with INTENT(OUT) argument 'lhs' and INTENT(IN) argument 'rhs' at (1)

There are a couple of options you may want to consider with your colleagues working on stdlib, if that suits you:

  1. Consider making the specific procedure of the generic interface accessible and invoke it directly:
module m
   type :: t
      integer :: n = 0
   end type
   interface assignment(=)
      module procedure assign_t
   end interface
contains
   subroutine assign_t( lhs, rhs )
      type(t), intent(out) :: lhs
      type(t), intent(in)  :: rhs
      lhs%n = rhs%n
   end subroutine 
end module
   use m
   type(t) :: a
   a%n = 42
   call assign_t( a, (a) ) !<-- invoke the specific procedure directly
   print *, "a%n = ", a%n, "; expected is ", 42
end
C:\temp>gfortran -ffree-form -Wall -Wextra -Wimplicit-interface -fPIC -g -fcheck=all -fbacktrace p.f -o p.exe

C:\temp>p.exe
 a%n =           42 ; expected is           42

I was preparing a similar example to @FortranFan one

module foo
implicit none
    type bar
        integer :: n
    end type
    interface assignment(=)
        module procedure :: assign
    end interface
contains
    subroutine assign(a,b)
    type(bar), intent(out) :: a
    type(bar), intent(in ) :: b
        print*, "       addresses of a and b:", loc(a), loc(b)
        a%n = 0
        a%n = b%n
    end subroutine
end module

program main
use foo
implicit none
type(bar) :: a
    a%n = 999
    print*, "a%n before call assign(a,a):", a%n
    call assign(a,a)
    print*, "a%n after  call assign(a,a):", a%n
    print*
    a%n = 999
    print*, "a%n before a = a:", a%n
    a = a
    print*, "a%n after  a = a:", a%n
end

The component %n is set to 999 before either calling the assignement routine or the overloaded assignement. In the routine it is first set to zero before being set to what it should be. On output we would like the component to be 999

With gfortran 13.1, the argument aliasing results in the component to be zero in both cases. We can check that the arguments are aliased in the routine.

 a%n before call assign(a,a):         999
        addresses of a and b:      140735562059468      140735562059468
 a%n after  call assign(a,a):           0

 a%n before a = a:         999
        addresses of a and b:      140735562059468      140735562059468
 a%n after  a = a:           0

ifx 2023.1 however can sort it out when using the overloaded assignment. And we can see that it makes a copy-in of b before calling the routine:

 a%n before call assign(a,a):         999
        addresses of a and b:               4785592               4785592
 a%n after  call assign(a,a):           0
 
 a%n before a = a:         999
        addresses of a and b:               4785592       140734996493684
 a%n after  a = a:         999

I don’t know if ifx is going beyond the standard or if the standard requires this behavior.

EDIT: I can even check that with a = b (b being different from a), ifx is doing a copy-in of b before calling the assign routine. It seems that ifx always considers the RHS as an expression that has to be evaluated with a temporary allocation for the result.

Another option is to implement a setter method for the type that allows you to what you seek: shown below is a clone procedure.

module m
   type :: t
      integer :: n = 0
   contains
      procedure :: clone
   end type
   interface assignment(=)
      module procedure assign_t
   end interface
contains
   subroutine clone( lhs, rhs )
      class(t), intent(inout) :: lhs
      type(t), intent(in)     :: rhs
      call assign_t( lhs, rhs )
   end subroutine
   subroutine assign_t( lhs, rhs )
      type(t), intent(out) :: lhs
      type(t), intent(in)  :: rhs
      lhs%n = rhs%n
   end subroutine
end module
   use m
   type(t) :: a
   a%n = 42
   call a%clone( (a) )
   print *, "a%n = ", a%n, "; expected is ", 42
end
C:\temp>gfortran -ffree-form -Wall p.f -o p.exe

C:\temp>p.exe
 a%n =           42 ; expected is           42

This seems like a general aliasing problem involving assignment. What does a compiler do when it sees an expression like a=a with intrinsic assignment. Does it treat it as a noop, or does it make a copy of the RHS and go through the memory copy process?

With a defined assignment, the programmer could test for the alias (e.g. with c_loc()), and then do a quick return if necessary. Is that what the fortran standard expects programmers to do?

Or a compiler could always make a copy of the RHS before invoking the assignment function, which could basically double (or more) the effort for complicated derived types – that doesn’t seem optimal either.

Is it the programmer’s responsibility to avoid the a=a situation, or is it the programmer’s responsibility to account for the alias when necessary, or should the compiler avoid the alias by making a RHS copy?

1 Like

I tend to think the latter is the right one (and possibly optimizing by doing nothing in case of a = a?), as the first two ones look weird. But I don’t really know.

@tomohirodegawa ,

Upon a bit of further thought, my read is you can change the assignment to explicitly reference an expression on the right-hand side to make the program conform:

bitsetl(0) = ( bitsetl(0) )

And then file another report with GCC Bugzilla for that processor.

This might satisfy the principle of least change for stdlib while enhancing that one processor if the support request is worked on promptly.

2 Likes

The only exceptions are when the aliased arguments are not modified. That is, it is allowed to reference the same actual argument through multiple dummy arguments, so long as none of those dummy arguments are modified. Aliases can also occur for module entities and dummy arguments, but I don’t think that applies to this defined assignment situation.

By the time the assign_large( set1, set2 ) subroutine is invoked, the programmer has already effectively told the compiler that set1 and set2 are not aliased. If this subroutine were invoked in the normal way, then the programmer could place parentheses around the actual argument, making it an expression rather than a variable, and the compiler would then make a copy. That would avoid the aliasing problem, but it would be inefficient when, for example, the argument is a complicated derived type with many levels of allocatable arrays. Then the subsequent assignments within the subroutine might be inefficient when a large amount of memory is copied back to the original derived type variable. The VALUE attribute could also be added to the set2 declaration, and that would also force the compiler to make a copy and avoid the alias problem, but it would have the same efficiency problems for complicated derived types. I don’t think making copies is really the right approach to solve this problem.

It seems like the “best” solution is for the standard to be changed so that the a=a special case is required to be identified by the compiler and to treat it as a noop special case for defined assignment.

I think defined assignment is the only place where this alias issue arises. All the other cases I can think of are already under the programmer’s control (or all arguments are unmodified, so aliases are irrelevant); this is the only one that isn’t. Is that correct?

2 Likes

Is it not an option to make the intent of the left-side argument to inout (rather than out), so that no auto-initialization (or deallocation) will occur? (Possibly, the need for “pure” makes it not suitable?)

module test_mod
implicit none
    type mytype
        integer, allocatable :: n(:)
    end type
    interface assignment(=)
        module procedure :: assign
    end interface
contains
    subroutine assign(left, right)
        type(mytype), intent(inout) :: left
        type(mytype), intent(in )   :: right

        if (loc(left) == loc(right)) return  !! self-assignment guard
        print *, "copying"
        left% n = right% n(:)
    end subroutine
end module

program main
    use test_mod
    implicit none
    type(mytype) :: a, b

    a% n = [1,2,3]
    print*, "before: a% n(:) =", a% n
    a = a
    print*, "after:  a% n(:) =", a% n

    a% n = [4,5]
    b = a
    print*, "after:  b% n(:) =", b% n
end
$ gfortran-12 -fcheck=all -O3 test.f90  & ./a.out
 before: a% n(:) =           1           2           3
 after:  a% n(:) =           1           2           3
 copying
 after:  b% n(:) =           4           5

If intent(out) is needed, we might use a separate routine, like

    subroutine assign(left, right)
        type(mytype), intent(inout) :: left
        type(mytype), intent(in )   :: right

        if (loc(left) == loc(right)) return
        call assign_neq(left, right)
    end subroutine

    subroutine assign_neq(left, right)
        type(mytype), intent(out) :: left    !! reset "left" to the initial state
        type(mytype), intent(in ) :: right

        print *, "copying"
        left% n = right% n(:)
    end subroutine

(I guess this is related to the topic of “self-assignment guard (or protection)”, and
these pages seem to recommend such a guard in the case of C++ (though other languages may be different.)

1 Like

There is no aliasing issue with defined assignment. 10.2.1.5 Note 1 helpfully summarizes:

The rules of defined assignment (15.4.3.4.3), procedure references (15.5), subroutine references (15.5.4), and elemental subroutine arguments (15.8.3) ensure that the defined assignment has the same effect as if the evaluation of all operations in x2 and x1 occurs before any portion of x1 is defined. If an elemental assignment is defined by a pure elemental subroutine, the element assignments can be performed simultaneously or in any order.

2 Likes

Actually the standard defines an assignment as:

R1032 assignment-stmt is variable = expr

A possible interpretation is hence that a compiler is required to process th RHS as an expression, i.e. a = a is actually a = (a). If so, ifx gets it right and this is a bug in gfortran

The standards document (https://j3-fortran.org/doc/year/23/23-007r1.pdf) says:

10.2.1.4 Defined assignment statement (page 191)
A subroutine defines the defined assignment x_1 = x_2 if …

10.2.1.5 Interpretation of defined assignment statements** (page 192)
…
NOTE
The rules of defined assignment (15.4.3.4.3), procedure references (15.5), subroutine references (15.5.4), and elemental subroutine arguments (15.9.3) ensure that the defined assignment has the same effect as if the evaluation of all operations in x_2 and x_1 occurs before any portion of x_1 is defined. If an elemental assignment is defined by a pure elemental subroutine, the element assignments can be performed simultaneously or in any order.

15.4.3.4.3 Defined assignments (page 329)
If ASSIGNMENT ( = ) is specified in a generic specification, all the procedures in the generic interface shall be subroutines that can be referenced as defined assignments (10.2.1.4, 10.2.1.5). Defined assignment may, as with generic names, apply to more than one subroutine, in which case it is generic in exact analogy to generic procedure names.
Each of these subroutines shall have exactly two dummy arguments. The dummy arguments shall be nonoptional dummy data objects. The first argument shall have INTENT (OUT) or INTENT (INOUT) and the second argument shall have the INTENT (IN) or VALUE attribute. Either the second argument shall be an array whose rank differs from that of the first argument, the declared types and kind type parameters of the arguments shall not conform as specified in Table 10.8, or the first argument shall be of derived type. A defined assignment is treated as a reference to the subroutine, with the left-hand side as the first argument and the right-hand side enclosed in parentheses as the second argument. All restrictions and constraints that apply to actual arguments in a reference to the subroutine also apply to the left-hand-side and to the right-hand-side enclosed in parentheses as if they were used as actual arguments. The ASSIGNMENT generic specification specifies that assignment is extended or redefined.
(… bold font by me…)

Here, I am wondering:

[Q1] what happens if x_1 (= the first dummy argument corresponding to the left-side of assignment) is a derived type with an allocatable component and has intent(out)? Does the standard mean that all the information necessary for evaluating the subroutine will be retrieved by the compiler from x_1 before any re-initialization caused by intent(out)?

(I imagined that intent(out) causes the re-initialization of x_1 (= resetting to its initial state) by the compiler (like deallocation of allocatable components) immediately after entering the subroutine. It seems to be what is happening in various code snippets above, because segmentation fault happens if intent(out) is used. Is my interpretation above not correct…?)

[Q2] Does the section 15.4.3.4.3 (Defined assignments) above mean that

x1 = x2

should be interpreted as equivalent to a subroutine call like

call assign( x1, (x2) )

?

1 Like

That is the way I read the above text. However, in the x1=x1 case, or an equivalent situation, it seems like this could result in a lot of redundant and unnecessary effort. Is a compiler allowed to recognize this situation and treat it as a special case? Should a processor be required to recognize this special case and avoid that unnecessary effort?

Even in the case of x1 /= x2, if the compiler sends (x2) as a temporary variable (after making a deep copy of x2), I guess it would need a lot of unnecessary effort… But maybe as an optimization, ifx possibly examines the address of actual arguments on the caller side and determines which of assign( x1, x2 ) and assign( x1, (x2) ) to use internally…?

Apart from that, my concern is that even when a temporary variable like (a) is created, direct translation of a = a to assign( a, (a) ) might give a wrong result (unless it is converted to “no op”). One such case may be when a is a derived-type variable and has a pointer component that points to another allocatable component in the same type. When entering the assign() routine, a temporary variable of RHS is created (e.g. via bitwise copy), while allocatable components of LHS could be deallocated via intent(out), so the original pointer can become ill-defined… (something like the following).

module test_mod
implicit none
    type Foo_t
        integer, allocatable :: dat        !! raw data
        integer, pointer :: val => null()  !! alias to "dat" (self-referencing pointer)
    end type
contains
    subroutine Foo_init( f )
        type(Foo_t), target :: f

        allocate( f% dat, source=0 )
        f% val => f% dat    !! define an alias
    end subroutine

    subroutine assign( left, right )
        type(Foo_t), intent(out) :: left   !! reset to the initial state
        type(Foo_t), intent(in)  :: right

        call Foo_init( left )
        print *, "assign(): loc = ", loc(left), loc(right)
        print *, "assign(): val = ", left% val, right% val
        left% val = right% val
    end subroutine
end module

program main
    use test_mod
    implicit none
    type(Foo_t) :: b, a

    call Foo_init( b )
    call Foo_init( a )

    b% val = -1
    a% val = 123
    print *, "main: val = ", b% val, a% val
    print *, "main: loc = ", loc(b), loc(a)

    !! Assuming that "b = a" is translated to "call assign( b, (a) )".

    call assign( b, (a) )    !! <-> b = a
    print *, "main: val = ", b% val, a% val

    call assign( a, (a) )    !! <-> a = a
    print *, "main: a% val = ", a% val
end
$ gfortran-12 -fcheck=all -Wall test.f90 && ./a.out

 main:     val =           -1         123
 main:     loc =        94011471794320       94011471794304
 assign(): loc =        94011471794320      140726436201392
 assign(): val =            0         123
 main:     val =          123         123
 assign(): loc =        94011471794304      140726436201376
 assign(): val =            0           0
 main:  a% val =            0

(To avoid this kind of trouble, the [Q1] above comes into play…? (like getting the value of right% val first when entering assign()…?)

To me, definitely yes.

Thank you for the information from various perspectives and helpful discussion.

I understand there are four ways to avoid self-assignment a=a using overloaded assignment operations for derived types.

  1. Use the intrinsic assignment operation for derived types.
  2. Set the intent(inout) attribute to the argument on the left-hand side of the assignment operator and introduce the self-assignment guard (in the context of C++) using c_loc.
  3. Set the value attribute to the argument on the right-hand side of the assignment operator.
  4. Explicitly refer to the variable on the right-hand side of the assignment operator using ().

I’m currently trying to support the sorting procedures in stdlib to the bitset_large type. 4 requires modifications, verifications, and performance measurements of the sorting procedures, even for already supported types. 1 is considered to be the most realistic way for the bitset_large type because bitset_large type doesn’t have pointer component.

1 Like

I have zero experience with the bitset type up to now (so cannot imagine what is best for library implementation), but personally in my codes, I stick to intrinsic assignment when a given composite type is planned to have no pointers inside. If some class has a pointer, I explicitly call a custom subroutine, partly because I am still not sure what happens under the hood for defined assignment… (I will ask about this later). Also, if the second argument is internally passed with (...), it may be meaningless to take its address for self-assignment guard.

One more problem may be that one cannot forbid the inclusion of pointer components into a derived type. So, we may need to make sure that future people will not inadvertently add pointers if intrinsic assignment is used throughout.

(I am sorry for repeated edits – I should write more carefully (or drink more coffee :sweat_smile:)

And also because the allocatable component has a default lower bound. In the case of an allocatable component with non-default lower bound, the intrinsic assignement would not be able to correctly transfer the bounds to the assigned object.

1 Like