Should we avoid assignment of derived types in robust programs?

The thread was here, and there is an issue A shorthand for immutability at j3-fortran/fortran_proposals.

The C programming language doesn’t support operator overloading, and yet is still very useful. As @aradi has stated above,

As long as one works with derived types which contain static and allocatable components only, things should work without any problems.

Languages (and their compilers) do get amended/improved with time, so I wouldn’t be so pessimistic.

Consider the C++ example I showed above, the ability to delete the special member function operator=() only became available in C++11. Before that, you could only hide the operator as a private class member, which would result in a linking error. The behavior of the special member functions in C++ is still a vexing topic for many C++ newcomers. So is initialization (see CppCon 2018: Nicolai Josuttis “The Nightmare of Initialization in C++”), move semantics, and several other language topics. Fortran remains much easier IMO.

Both C++ classes and Fortran derived types are based upon Simula. Given that C++ added classes 18 years earlier than Fortran it’s natural they’ve had more time to fix such issues. And yet you meet programmers who get frustrated with C++, deciding to use Rust, D, or other languages instead.

6 Likes

I agree with the overall conclusion above… I also use pointer components in a derived type (for a different reason (*)), but in that case I avoid using any assignment of objects. If I want to use assignment, I include only primitive types and allocatable things (not pointers) as type components. So separating like “copy-OK” type and “not-OK” type, in a sense…
But there seems no strict/built-in way to prohibit copy or assignment, I guess there is no “safe” or “robust” way to use “not-OK” type (unless relying on local rules or by carefully reading the source code).

(*) In my case, I use pointer components to make an alias to other components of nested derived-type components, such that I can use foo % x rather than foo % baa % baz % x. Indeed, this is one feature request for future standard (i.e., capability of making a component alias, or some kind of component forwarding(?) mechanism). I believe this is useful when using composition (rather than inheritance).

1 Like

Earlier I wrote:

The paper by @rouson and coauthors linked below, discusses a derived type wrapping a C++ object with the help of reference counting:

This Isn't Your Parents' Fortran: Managing C++ Objects with Modern Fortran | IEEE Journals & Magazine | IEEE Xplore

Towards the end of the paper, you learn that the semantics of the wrapper object are,

type(vector) :: A, B   ! wrappers of a C++ vector object

A = B           ! shallow-copy, A is a reference to B
A = vector(B)   ! deep-copy, two separate objects exist

The shallow copy is what you’d expect for a derived type containing pointer components even if it were purely in Fortran. In their implementation the assignment operator was not overloaded.

So my first impression is that even with the smart-pointer/reference counting, you cannot overcome the issue you put succinctly:

That means, you have to be sure about the internals of a type (whether it has pointers or not) before you can use it in an assignment.

I’d love to hear what @rouson or @sblionel have to say on this topic. It seems like a quite fundamental limitation of the Fortran derived type that assignment cannot be robustly overloaded or deleted like in C++.

1 Like

Just for some balance, there is a very popular array domain-specific language with similar semantics to the wrapper object I just mentioned:

x = array([1, 2, 3])

y = x         # shallow copy
z = copy(x)   # deep copy

Modifying x will also modify y, but z will be untouched. In case the comment hasn’t already given it away, it is NumPy.

Given how widespread and popular NumPy is, I suppose consumers can learn to use assignment and copying correctly as long as they are equipped with good documentation and how-to’s.

Now I am always prowling for things to tweet about :slight_smile:. Here are some ways to create shallow and deep copies. If my terminology in the comments is wrong please advise.

character (len=20)    :: fmt = "(a,' =',*(1x,i0))"
integer , target      :: a(2)
integer , pointer     :: b(:)
integer , allocatable :: c(:), f(:)
a = [4,9]
b => a ! shallow copy
allocate (c, source=a) ! deep copy
associate (d => a, e => (a))
! d is a shallow copy, e is a deep copy
f = a ! deep copy
d = d - 1 ! changes a, b, d
print fmt, "a", a
print fmt, "b", b
print fmt, "c", c
print fmt, "d", d
print fmt, "e", e
print fmt, "f", f
end associate
end

Output:

a = 3 8
b = 3 8
c = 4 9
d = 3 8
e = 4 9
f = 4 9

It seems to be a deep-copy, but actually it is not necessarily! You have no clue, which kind of copy it is, unless you know the internals of the type you use as source. If that type contains pointers, it will be a semi-deep copy, with some components (static and allocatable components) deep copied, but pointer components copied without duplicating the data they point to. (So it will be a mess…) And worse than that, even if the source type has a user defined assignment in order to ensure proper copying, the sourced allocation would simply ignore that! So, my advice would be actually not to use sourced allocation for any user defined derived types ever, as it is not robust against changes in the internal details of the type! And to some extent, unfortunately, the same conclusion holds for assignments as well…

3 Likes

It was not overloaded in the actual type, but one of its component (counter) has a user defined assignment.

Thinking more about it, the idea of creating a special type for the “dirty” part (the one which needs a user defined assignment) and add it as a component to a type without overloaded assignment, seems to make the assignment indeed more robust. At least all the cases I had above would work without segfault, as all of them would use intrinsic assignment, which (hopefully) triggers the user defined assignment for the dirty component.

So, I’ve modified my original code, see below. Depending, which compiler I use, I get the user defined assignment invoked either for all cases (intel), or only for the one when a type(ext_t) is assigned to a type(ext_t) (nag, gnu). But, probably this is a valid optimization choice (using a kind of move_alloc instead of create, copy, destroy). So probably, the advice for robust containers can be slightly weakened:

Expose only derived types without overloaded assignment. In case some data needs special treatment on copy (e.g. pointers), pack them into a derived type with overloaded assignment, make this derived type a component of a wrapper derived type, and expose the wrapper type only. (And do not use derived types as source in an allocation.)

Demonstration code with wrapped component
module testmod
  implicit none

  type, abstract :: base_t
  end type base_t

  ! Wrapping everything dirty (needing user defined assignment into a special type)
  type :: dirty_t
    integer :: value = -1
  contains
    procedure :: dirty_assign
    generic :: assignment(=) => dirty_assign
  end type dirty_t

  ! Extends abstract interface
  type, extends(base_t) :: ext_t
    type(dirty_t) :: dirty
  end type ext_t

  interface ext_t
    module procedure ext_t_construct
  end interface ext_t

contains

  subroutine dirty_assign(this, other)
    class(dirty_t), intent(out) :: this
    type(dirty_t), intent(in) :: other

    print *, "Invoking base assignment with rhs value:", other%value
    this%value = other%value

  end subroutine dirty_assign


  function ext_t_construct(val) result(this)
    integer, intent(in) :: val
    type(ext_t) :: this

    this%dirty%value = val

  end function ext_t_construct

end module testmod


program testprog
  use testmod
  implicit none

  print "(/,a)", "*** ASSIGNMENT TO UNALLOCATED TYPE"
  block
    type(ext_t), allocatable :: ext
    ext = ext_t(1)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"


  print "(/,a)", "*** ASSIGNMENT TO ALLOCATED BASE (WITH DYNAMIC TYPE EXT)"
  block
    class(base_t), allocatable :: base
    class(ext_t), allocatable :: ext
    allocate(ext)
    call move_alloc(ext, base)
    print *, "BASE allocated now with dynamic type EXT"
    base = ext_t(1)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"

  print "(/,a)", "*** ASSIGNMENT TO UNALLOCATED BASE"
  block
    class(base_t), allocatable :: base
    base = ext_t(2)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"

  print "(/,a)", "*** ASSIGNMENT TO ALLOCATED EXT"
  block
    class(ext_t), allocatable :: ext
    allocate(ext)
    print *, "BASE allocated now with dynamic type EXT"
    ext = ext_t(3)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"

  print "(/,a)", "*** ASSIGNMENT TO UNALLOCATED EXT"
  block
    class(ext_t), allocatable :: ext
    ext = ext_t(4)
    print *, "EXT allocated now"
    ext = ext_t(5)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"

end program testprog

I was too enthusiastic, the different behavior is not a valid optimization strategy, but apparently a fundamentally different interpretation of the standard by the various compilers, see Intrinsic assigment of derived types containing components with user defined assignment. So, at the end, even if the wrapping strategy may in theory solve many problems, it can not be relied on currently due to implementation differences in various compilers. :disappointed:

1 Like

From the Fortran perspective, the semantics of auto-reallocation on assignment apply only to allocatable components/variables and then only to intrinsic assignment. If a type has user-defined assignment, the assumption is that the user procedure knows what it has to do. Note that assigning to individual components will use either intrinsic or defined assignment, depending on what has been declared for the component type (if any). The implementation differences @aradi mentions are simply bugs. If there was a question about this it should have been raised as an interpretation request.

I don’t quite understand the emphasis on pointers in this thread - Fortran provides no automatic anything for pointers.

@sblionel Thanks a lot for your insights! As for the pointers: I am aware, that Fortran does not offer anything automatic for them. This perfectly makes sense to me, and this is why I started to experiment with defined assignment, where I had a few surprises.

My biggest surprise was (which started the thread), that I could not come up with a defined assignment, which would have worked together with auto-allocation. (Is there any way?) That means, that the consumer of a type has to know, whether the type comes with a user defined assignment or not, as in the former case the consumer can not assume, that the ususal auto-allocation of the LHS will happen. (Is that correct?)

Then, @ivanpribec suggested the approach of making an instance of the type with the defined assignment a component of a wrapper type without defined assignment, and exposing only latter. It seems, that in this case the consumer can indeed always rely on the LHS auto-allocation (so the consumer would not have to know anything about the internal details of the type), and the user defined assignment (of the component) would be triggered nevertheless. (Is that correct?) To me, that seems to be the most robust solution currently possible in Fortran, so I got quite excited. But then, I tried a toy example and hit all the implementation differences in the various compilers…

1 Like

It comes down to what is meant by “automatic” and there will be a fair bit of diversity of views among Fortranners on this. But there are indeed a couple of aspects with an “automatic” nature with pointers that usually affects adversely many a poor, persevering Fortranner:

  1. Fortran standard permits ALLOCATE( foo .. ) where foo is an object with the POINTER attribute with the semantics in the standard such that an anonymous object with the TARGET attribute is allocated by the processor and foo is then automatically pointer-assigned to this TARGET. In the context of this thread involving robust authoring of containers (derived types with certain special features), this is very much relevant because components of said containers or subcomponents thereof need to be TARGETs in actual practice but which is not feasible/convenient with the current standard and therefore library developers tend resort to containers of POINTER attribute and they end up encountering various troubles, some of which are mentioned in this thread.

  2. intrinsic assignment in the standard has for derived type components the rules that “automatically” perform pointer assignment for components of POINTER attribute in the variable to the corresponding component of the expr. This too is relevant to this thread.

The end result is the standard may claim a single interpretation on all its aspects, however the implementations either fail to or take too long to converge to that interpretation. This leads to Fortranners facing “heavy turbulence” with their codes, particularly if object-oriented aspects are in play. The questions asked by @aradi and @ivanpribec is for feedback and guidance from SMEs toward how to navigate through such turbulence. I am afraid the answers are going to be elusive for there simply isn’t enough of “coding in anger” going on yet with the object-oriented approach to formulate definitive best practices.

In a separate comment I will post my own suggestion that is minor variant of what has been suggested upthread with the wrapper approach but with the caveat this suggestion will only have been used in a couple of small codes in industry currently and that too with only one processor.

2 Likes

The user of the type shouldn’t need to be concerned about HOW the assignment is done. The developer of the defined assignment subroutine DOES need to worry about that and can do the appropriate “is the LHS allocated?” test and “do the needful”. I would say that if the developer punted on auto-reallocation, that’s their failure.

2 Likes

I tried a toy example and hit all the implementation differences in the various compilers…

I am still trying to read the entire thread (so not being able to follow the discussions yet), but at least, isn’t the above “implementation difference” already a serious issue? I remember “finalization” also has some implementation (or interpretation) difference discussed in many other places.

I think if there were some online site or something that we can try different Fortran compilers + versions online, such a site would be great to compare the behavior of different compilers very easily (otherwise one has to install them all manually by oneself…). Online compilers (like TryItOnline) usually provides only open-source compilers (like GCC).

1 Like

Unfortunately, the licensed compilers like NAG and Absoft are missing. But I think this fulfills your purpose of testing different compiler versions.

2 Likes

Thanks very much, I didn’t know that this site supports ifort etc and also shows the output of the program (I had imagined that this site is specific for assembler code outputs…)

For commercial compilers like NAG/Cray/IBM etc, it would be great if there are compiler trial online site provided by those companies (with strong restrictions for code size and execution time, of course), like https://run.dlang.io/ and https://play.nim-lang.org/ , particularly NAG compilers which is often considered to be rigorous about the standard.

I absolutely agree, but I don’t see, how I can ensure (with the help of type bound procedures), that both, assignment to a static variable and assignment to an allocatable variable works in combination with user defined assignment. You can see my trial below, which demonstrates the problem:

  • If I overload both, assignment to allocatable variable and assignment to static variable, the interface is claimed to be ambiguous.
  • If I only overload the assignment to a static variable, assignment to an allocatable on the consumer side segfaults.
  • If I only overload the assignment to a dynamic variable, the assignment to a static variable does not compile on the consumer side.

@sblionel Can you show me a way to resolve this conflict (without using an embedding type as discussed above)? Is it possible at all according to the current standard?

Unsuccesful attempt to overload assignment to static and allocatable variables
module testmod
  implicit none

  type :: dirty
    integer :: value = 0
  contains
    procedure :: assign_static_static
    procedure, pass(this) :: assign_alloc_static
    ! if both generic assignments are active, interface is claimed to be ambigous
    ! if only assign_static_static is active, "alloc = static1" below segfaults
    ! if only assign_alloc_static is active, "static2 = static1" below fails to compile
    generic :: assignment(=) => assign_static_static
    generic :: assignment(=) => assign_alloc_static
  end type dirty

contains

  subroutine assign_static_static(this, other)
    class(dirty), intent(out) :: this
    type(dirty), intent(in) :: other

    this%value = other%value + 1
    print "(a,i0,a,i0,a)", "static = static (", other%value, "->", this%value, ")"

  end subroutine assign_static_static

  subroutine assign_alloc_static(lhs, this)
    type(dirty), allocatable, intent(out) :: lhs
    class(dirty), intent(in) :: this

    allocate(lhs)
    lhs%value = this%value + 1
    print "(a,i0,a,i0,a)", "alloc = static (", this%value, "->", lhs%value, ")"

  end subroutine assign_alloc_static

end module testmod

program testprog
  use testmod
  implicit none

  type(dirty) :: static1, static2
  type(dirty), allocatable :: alloc

  print "(a)", "static2 = static1"
  static2 = static1
  print "(a)", "alloc1 = static1"
  alloc = static1

end program testprog
1 Like

@ivanpribec It seems, that indirection/wrapping is a robust solution for hiding implementation details (thanks for bringing up the idea!) when user defined assignment is involved. (And probably the only one, if the ambiguity demonstrated in Should we avoid assignment of derived types in robust programs? - #32 by aradi can not be resolved within the rules of the standard).

The program below demonstrates it: A container type (type(contaminated)) outsources its components requiring user defined assignments into a special type (type(dirty)). If only type(contaminated) is exposed to the consumer, it behaves like an intrinsic type in all assignments:

  • assigning to a static instance works
  • assigning to an allocatable instance works as well
  • as a bonus, even assignment to arrays of arbitrary rank works

And all this only requires one to override the assignment in type(dirty) for the scalar, static case!!!

It seems, that as so often in Fortran, wrapping is the solution. :smile: (As long, as the user does not make a sourced allocation, but that should be probably really avoided for any non-intrinsic type anyway… :wink: )

So the rule for derived types in robust programming would then read

Never expose a derived type with user defined assignment to the users of your library. If a derived type needs user defined assignment, make it a component of a derived type with intrinsic assignment and expose latter.

module testmod
  implicit none

  private
  public :: contaminated

  type :: dirty
    integer :: value = 0
  contains
    procedure :: dirty_assign
    generic :: assignment(=) => dirty_assign
  end type dirty

  type :: contaminated
    type(dirty) :: contamination
  end type contaminated

contains

  subroutine dirty_assign(this, other)
    class(dirty), intent(out) :: this
    type(dirty), intent(in) :: other

    this%value = other%value + 1
    print "(a,i0,a,i0,a)", "dirty assign (", other%value, "->", this%value, ")"

  end subroutine dirty_assign

end module testmod


program testprog
  use testmod
  implicit none

  type(contaminated) :: static1, static2
  type(contaminated), allocatable :: alloc
  type(contaminated) :: static_array1(2), static_array2(2)
  class(contaminated), allocatable :: dynamic_contaminated

  print "(a)", "static2 = static1"
  static2 = static1
  print "(a)", "static_array1 = static_array2"
  static_array2(:) = static_array1
  print "(a)", "alloc1 = static1"
  alloc = static1
  print "(a)", "dynamic_contaminated = static1"
  dynamic_contaminated = static1

end program testprog

Note: due to compiler bugs (see Intrinsic assigment of derived types containing components with user defined assignment), the code above currently only works with the Intel compiler, but not with GNU, NAG or NVidia. (bugs reported)

You’re correct in that it seems not possible to use defined assignment that handles both the case of an allocatable lhs and a static lhs. I was playing with the rule that allows an allocatable dummy to be distinguishable from a pointer dummy (that doesn’t have intent(in)), but I couldn’t make it work.

So the “lesson” to be learned here (as per my hints in earlier posts across a couple of threads on this topic) is

  • Avoid implementing defined assignment in a “container” derived type, rely instead on intrinsic assignment semantics as provided in the standard, and
  • Should the “container” derived type need to have “components” for which the intrinsic assignment might be seen as inadequate e.g., those with POINTER attribute, then wrap such components themselves in a derived type that has a suitably defined assignment.
3 Likes