Should we avoid assignment of derived types in robust programs?

Let’s take an abstract base type which defines an interface and an extending type, which implements this interface and uses internally pointers. In order to handle the assignment correctly, it overrides the assignment operator. Now, in which of the cases below is the processor supposed to invoke the defined assignment:

  1. class(base) [allocated to a type(ext) instance] = type(ext)

  2. class(base) [unallocated] = type(ext)

  3. class(ext) [allocated to a type(ext) instance] = type(ext)

  4. class(ext) [unallocated] = type(ext)

Demonstration code
module test
  implicit none

  type, abstract :: base_t
  end type base_t

  type, extends(base_t) :: ext_t
    integer :: val = 0
  contains
    procedure :: ext_assign
    generic :: assignment(=) => ext_assign
  end type

contains

  subroutine ext_assign(this, other)
    class(ext_t), intent(out) :: this
    type(ext_t), intent(in) :: other
    print *, "ASSIGNMENT ext_assign invoked"
    this%val = other%val
  end subroutine ext_assign

end module test

program testprog
  use test
  implicit none

  print "(/,a)", "*** ASSIGNMENT TO ALLOCATED BASE (WITH DYNAMIC TYPE EXT)"
  block
    class(base_t), allocatable :: base
    class(ext_t), allocatable :: ext
    allocate(ext)
    call move_alloc(ext, base)
    print *, "BASE allocated now with dynamic type EXT"
    base = ext_t(1)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"

  print "(/,a)", "*** ASSIGNMENT TO UNALLOCATED BASE"
  block
    class(base_t), allocatable :: base
    base = ext_t(2)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"

  print "(/,a)", "*** ASSIGNMENT TO ALLOCATED EXT"
  block
    class(ext_t), allocatable :: ext
    allocate(ext)
    print *, "BASE allocated now with dynamic type EXT"
    ext = ext_t(3)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"

  print "(/,a)", "*** ASSIGNMENT TO UNALLOCATED EXT"
  block
    class(ext_t), allocatable :: ext
    ext = ext_t(4)
    print *, "EXT allocated now"
    ext = ext_t(5)
    print *, "ASSIGNMENT DONE"
  end block
  print *, "BLOCK DONE"

end program testprog

As my tests with the self-containing program above suggest, cases 1 and 2 use intrinsic assignment, while cases 3 and 4 use the user defined one. (Case 4 leads to a segfault as user defined assignment does not allocate the LHS.)

  • Is that the expected behavior as defined by the standard?

If yes, it raises several questions to me:

  1. Does this mean, that the consumer of a derived type must be aware, whether the type overrides assignment or not?
    Since in the latter case, it can not rely on the automatic allocation of the LHS, leading to a segfault, as in my
    example.

  2. In case 2, if the derived type on the RHS is copied to the LHS (and then finalized as suggested by the discussion in Finalization/Copy in intrinsic polymorphic assignment) this would lead to catastrophic results: The user defined assignment would be not invoked (so pointed content not duplicated), but due to the finalization of the RHS probably destroyed, so the LHS had broken pointers. (It would also make reference counting techniques impossible…)

So it seems to me, that in a robust programming model, assignment between derived types and classes must be best avoided, also at the consumer side, otherwise lacking knowledge on the implementation details of the type/class can lead to unwanted side effects (segfaults). Is that conclusion correct?

Any comments sheding more light on this are welcome. Please also note, my purpose is not to criticize the language but to understand which techniques for implementing robust containers (where the consumer has not to care about the internal implementational details of the container) are possible with the current features (and current compilers). And maybe also to think about possible language extensions for future standards, which could ease this task.

7 Likes

Thank @aradi for bringing this up. Being a green hand on derived types, I have recently been working on a derived type containing allocatable components (see A derived type containing a callback function as a member). The questions raised by @aradi seems quite important according to my very limited understanding. I look forward to seeing some discussions, even though I cannot input anything for lack of experience.

As long as one works with derived types which contain static and allocatable components only, things should work without any problems. The problem starts, when the derived type has pointer components, as you can make silly mistakes, which the compiler may not able to catch. And, in case you use a derived type from an existing library, you may not even have the information, whether it has pointer components. I think, in those cases the safest thing you can do is to keep away from assignments…

1 Like

Two options which work in the fourth case (assignment to unallocated ext) are:

! typed allocation
allocate(ext_t :: ext)
ext = ext_t(4)

! sourced allocation
allocate(ext,source=ext_t(4))

Looking through the latest edition of MRC (the red book), the dynamic type of a polymorphic allocatable variable can be altered only through

  • a typed allocation using allocate( ... :: ...) statement (if the type specification is omitted, it will be allocated to its declared type)
  • a sourced allocation using allocate(..., source=...)
  • through a move_alloc statement

Finally intrinsic assignment to an allocatable polymorphic variable is allowed. If allocated, the LHS will be deallocated first. Next it will be allocated to have the dynamic type of the RHS expression. Finally, the values are copied just like in normal assignment (shallow copy for pointer components, deep copy for allocatable components). The effect is similar to:

if (allocated(variable)) deallocate(variable)
allocate(variable,source=expression)

If anyone would like to give this a second look, the relevant sections in MRC are 15.3, 15.4, and 15.5.

On the other hand:

class(ext_t), allocatable :: ext
type(ext_t), allocatable :: tmp
tmp = ext_t(4)
call move_alloc(tmp,ext)

also fails with a segmentation fault, but this time in the assignment to tmp. So it seems like the problem is not assignment between derived types and classes (polymorphic variables), but allocation upon assignment of derived types in general.

Concerning a robust programming model (specifically for a container library), what are the remaining options?

  • You can turn assignment into an error, but I’m not sure this is really productive.
    subroutine ext_assign(this, other)
      ! ...
      write(*,*) "Assignment of ext_t is forbidden"
      error stop
    end subroutine
    
    Consumers will need to call subroutines instead even when the objects are not allocatable, which feels unnatural.
  • Given that the container implements dynamic storage internally, it probably doesn’t need the allocatable attribute as a means of saving storage space. So users should be recommended to avoid making their derived type instances allocatable.
  • In case polymorphism is required (and thereby the allocatable attribute), consumers should be encourage to use sourced allocation or follow the rule Allocation Before Assignment (ABA). This requires some discipline on the programmer’s side, and the awareness they are using a derived type. Personally, I think it’s just as good a convention as is avoiding assignment altogether.

I’m still confused with why cases 1 and 2 invoke the intrinsic assignment. I’m wondering if it has to do with the assignment interface (ext_t instead of base_t) for this:

  subroutine ext_assign(this, other)
    class(ext_t), intent(out) :: this
    type(ext_t), intent(in) :: other

Edit: okay, in case 2, the intrinsic assignment to an allocatable polymorphic variable is invoked (section 15.5 in MRC). But in case 4, the overloaded assignment is called since there is an available interface.

Edit 2: it seems that in both cases 1 and 2 the, the intrinsic assignment to an allocatable polymorphic variable gets performed. MRC note that:

… if the variable is already allocated with the correct type, no reallocation is done

I’m guessing that case 1 doesn’t resolve to the overloaded assignment because the LHS instance is class(base_t) (even if the dynamic types match).

Is there a compile-time method to make a derived type non-assignable? E.g. in C++11, one can delete the special member functions such as the operator=:

struct NonCopyable {
    NonCopyable & operator=(const NonCopyable&) = delete;
    NonCopyable(const NonCopyable&) = delete;
    NonCopyable() = default;
};

(Example taken from c++ - Disable compiler-generated copy-assignment operator - Stack Overflow)

I’ve tried making the generic assignment private,

  type, extends(base_t) :: ext_t
    integer :: val = 0
  contains
    procedure :: ext_assign
    generic, private :: assignment(=) => ext_assign
  end type

but the compiler just reverts back to the default assignment.

Yes, exactly that is one of the problems. As far as I see, there is no way in Fortran for a derive type to reliably raise a compile time error, when involved in various types of assignment. That means, you have to be sure about the internals of a type (whether it has pointers or not) before you can use it in an assignment. Which means to me, in a robust program you should not rely on the assignment for the derived type, as you would be doomed, if the internals of the derived type changed.

2 Likes

A huge share of C++ literature is related to explaining the semantics of construction, copying, assignment, moving, and destruction. Two “safe” idioms they have come up with are:

Given that Fortran lacks the concept of object construction entirely, and the result of overloading the structure constructor will always need pass through an assignment, for derived types which contain pointer components initialization by a subroutine seems to be safest option. The same then goes for copying, or moving an object, in a fashion similar to the rule of three.

It seems like the C++ community had just began to learn all the hard-lessons on safe OO programming, roughly at the same time OO features were added to the Fortran standard.

As a slight tangent, I’m wondering if one extra-level of indirection can help, meaning you wrap the pointer in another derived type, nested in the consumer one. This would be akin to the smart pointers like std::unique_ptr or std::shared_ptr. I believe a comment from @rouson posted in a stdlib issue is relevant to this discussion:

A decade ago, when I worked on the Trilinos project, which already surpassed a million lines of code back then, contributors were forbidden from using raw pointers and this was a C++ project so that was a pretty radical position at the time. Contributors were required to use a reference-counted pointer template class instead — effectively what these days would be called a smart pointer. Fortran’s allocatable variables are our smart pointers. Whenever they can be used, I recommend using them over pointers in every case. There are so many subtle ways to get things wrong in hard-to-debug ways with pointers that I’ll avoid using a library that has pointers under the hood if I can possibly avoid it. I recently abandoned a library that was exhibiting strange behaviors that I couldn’t diagnose and was using pointers. You can’t just think about whether the code works now. Think about what happens when a naive new developer comes into a project, not knowing the best practices to keep the code safe from runtime errors. I basically won’t believe the correctness of any non-trivial code that uses pointers if that code doesn’t encapsulate the pointers in a way that guarantees automated reference-counting and if the project doesn’t enforce a policy that only the encapsulated form of reference-counted pointer ever be used in the project.

1 Like

One possible way of getting a non-assignable object while using a form of RAII, is to use associate.

First you need to overload the structure constructor:

  function new_ext_t(val) result(this)
    integer, intent(in), optional :: val
    type(ext_t) :: this

    if (present(val)) then
      this%val = val
    end if
  end function

Next you build a facade module that only exposes the overloaded structure constructor, but keeps the type hidden:

module test_facade
  use test, only: ext_t => new_ext_t
end module

Finally, you instruct consumers to use the container within an associate block:

program testprog
  use test_facade, only: ext_t
  implicit none

  type(ext_t) :: a  ! compile-time error

  associate(ext => ext_t(3))
    ! ... access public members of ext ...
    ext = ext_t(4)  ! compile-time error
  end associate

Inside the associate statement I believe you can still call any type-bound methods (not sure what happens if a method is intent(out)). The objects is also supposed to be finalized upon exiting the associate block (see Should associate trigger automatic finalization?). Unfortunately, you may bump into issues with compiler support.

Their is a nice example of this pattern in the nlopt-f library for defining callbacks that get passed to a calling C routine.

Edit: If you need a copy, you can overload the structure constructor with a second method,

module test_facade

  use test, only: new_ext_t, copy_ext_t
  private

  public :: ext_t

  interface ext_t
    procedure :: new_ext_t, copy_ext_t
  end interface

end module

that can be used with a second associate statement:

  associate(this => ext_t(3))
    ! ...
    associate(copy => ext_t(this))
      ! ...
    end associate
  end associate
1 Like

The trick with the associate block is an interesting approach, indeed! Thanks for bringing it up.

I see, however, two pragmatical problems:

  • You would have associate constructs all over the place, which does not necessary improve readability.

  • Worse than that, how would you pass instances of the type around, or how would you put them inside a derived type? Their lifetime would be always restricted to the scope of associate block. For something general, like a list or a dictionary, I think this restriction would be too severe.

As for the comment of @rouson : I absolutely agree, one should use allocatables, wherever it is feasible. Unfortunately, if you want to create an efficient list, allocatables are sometimes not enough. Think about a list containing large objects (arrays, other containers, etc.). You may wish the list to be able to deliver a pointer to a given list element, so that you can process the elements content without having to make a copy of it. As discussed in Enforcing target attribute for actual argument in TBP calls, it is basically impossible to obtain a reliable pointer to the component of a derived type, if it has no pointer components…

1 Like

Yes, the verbosity of associate can be a nuisance. (But not necessarily worse than restricting yourself to calling subroutines everywhere.) Perhaps the standard could introduce a second form of associate, which implicitly terminated at the end of functions, subroutines, and blocks (I think there was a thread about this).

I don’t have an answer at hand for the second point.

Concerning your follow-up comment, in the past I was playing around with an iterator like idea for a list of integers (strings) and using associate to retrieve a pointer:

I think it has some elements of what you are asking for. (The main idea is to keep the type hidden, but have a function which returns a pointer to a target that can be used in an associate statement.)

Unfortunately, not. Your index_array with the target attribute is a module variable, which is a no-go for containers, as you could only have one instance at a time. If you the place index_array into the type, you have to drop the target attribute.

Yes module storage was the key in that example. But each instance of integer_list points to it’s own index object. Wouldn’t this form the basis of a container, with something like this:

type :: matrix
  integer :: nrows, ncols
  integer, private :: ptr_id = -1
contains
  procedure :: get_ptr
end type

real, allocatable, target :: storage(:)
integer, parameter :: MAXOBJECTS = 1000
integer :: bounds(2,MAXOBJECTS)
contains

function get_ptr(mat) result(p)
type(matrix), intent(in) :: mat
real, pointer :: p(:,:)
! ... check if mat is a valid object ...
p(1:mat%nrows,1:mat%ncols) => &
  storage(bounds(1,mat%ptr_id):bounds(2,mat%ptr_id))
end function

Indeed, it’s a very dangerous path, because you essentially take all memory management into your own hands.

Worse than that, the solution using module variables is subject to race conditions. Imagine two threads running in parallel, each of them creating a separate thread private container. Your storage array is shared between the threads (it is a “global” module variable), so access would have to be guarded by locks somehow to prevent race conditions. I think, robust containers should be “self-contained”, manipulated only via pure subroutines.

To me these discoveries lead to rather grim conclusions. What is a language without assignments? Is Fortran still worth investing? Should we ever start new projects, considering that such key features are fundamentally broken and unreliable.

It is pretty frustrating, because it seems like whoever did design these language features, never attempted to implement a linked list?

I am seriously considering starting migrating my codes to another language.

1 Like

The thread was here, and there is an issue A shorthand for immutability at j3-fortran/fortran_proposals.

The C programming language doesn’t support operator overloading, and yet is still very useful. As @aradi has stated above,

As long as one works with derived types which contain static and allocatable components only, things should work without any problems.

Languages (and their compilers) do get amended/improved with time, so I wouldn’t be so pessimistic.

Consider the C++ example I showed above, the ability to delete the special member function operator=() only became available in C++11. Before that, you could only hide the operator as a private class member, which would result in a linking error. The behavior of the special member functions in C++ is still a vexing topic for many C++ newcomers. So is initialization (see CppCon 2018: Nicolai Josuttis “The Nightmare of Initialization in C++”), move semantics, and several other language topics. Fortran remains much easier IMO.

Both C++ classes and Fortran derived types are based upon Simula. Given that C++ added classes 18 years earlier than Fortran it’s natural they’ve had more time to fix such issues. And yet you meet programmers who get frustrated with C++, deciding to use Rust, D, or other languages instead.

6 Likes

I agree with the overall conclusion above… I also use pointer components in a derived type (for a different reason (*)), but in that case I avoid using any assignment of objects. If I want to use assignment, I include only primitive types and allocatable things (not pointers) as type components. So separating like “copy-OK” type and “not-OK” type, in a sense…
But there seems no strict/built-in way to prohibit copy or assignment, I guess there is no “safe” or “robust” way to use “not-OK” type (unless relying on local rules or by carefully reading the source code).

(*) In my case, I use pointer components to make an alias to other components of nested derived-type components, such that I can use foo % x rather than foo % baa % baz % x. Indeed, this is one feature request for future standard (i.e., capability of making a component alias, or some kind of component forwarding(?) mechanism). I believe this is useful when using composition (rather than inheritance).

1 Like

Earlier I wrote:

The paper by @rouson and coauthors linked below, discusses a derived type wrapping a C++ object with the help of reference counting:

This Isn't Your Parents' Fortran: Managing C++ Objects with Modern Fortran | IEEE Journals & Magazine | IEEE Xplore

Towards the end of the paper, you learn that the semantics of the wrapper object are,

type(vector) :: A, B   ! wrappers of a C++ vector object

A = B           ! shallow-copy, A is a reference to B
A = vector(B)   ! deep-copy, two separate objects exist

The shallow copy is what you’d expect for a derived type containing pointer components even if it were purely in Fortran. In their implementation the assignment operator was not overloaded.

So my first impression is that even with the smart-pointer/reference counting, you cannot overcome the issue you put succinctly:

That means, you have to be sure about the internals of a type (whether it has pointers or not) before you can use it in an assignment.

I’d love to hear what @rouson or @sblionel have to say on this topic. It seems like a quite fundamental limitation of the Fortran derived type that assignment cannot be robustly overloaded or deleted like in C++.

1 Like

Just for some balance, there is a very popular array domain-specific language with similar semantics to the wrapper object I just mentioned:

x = array([1, 2, 3])

y = x         # shallow copy
z = copy(x)   # deep copy

Modifying x will also modify y, but z will be untouched. In case the comment hasn’t already given it away, it is NumPy.

Given how widespread and popular NumPy is, I suppose consumers can learn to use assignment and copying correctly as long as they are equipped with good documentation and how-to’s.