Help working around finalization issue

I posted an example of an issue with the gfortran finalizer I’ve been scratching my head on:

This example has a derived type containing a hashmap:

type :: derived
   type(hash) :: map
end type

likely due to finalization issues (finalization being called onto a wrong memory address), memory gets corrupted and this causes all sorts of problems, when copying an entity:

type(derived) :: a,b
a = b ! CRASH

I’ve already been trying all sorts of workarounds including switching between pointer vs allocatables, and introducing user-defined final and assignment procedures, but so far I haven’t been lucky getting this to work with gfortran .

Because the crash happens accessing deallocated memory, I use magic numbers to manually store the allocation status, i.e.

integer(8) :: initialized = int(z'bbabababa') ! means "initialized"
integer(8) :: uninitialized = int(z'dededededf') means "uninitialized"

so among the 2^64 possible garbage values read by the finalization procedure, only two actually do represent valid allocation states, but this is apparently not enough in this case.

So I was wondering if any of the Fortran gurus here on Discourse has experience implementing practical workarounds for this type of nested derived types, and if there are guidelines that should be followed to avoid this… it would be very appreciated :slight_smile:

Can you cut your example down to something minimal that reproduces the crash? Ideally where nothing else can be deleted. It’s difficult to investigate 1.6k lines :slight_smile:

1 Like

Agreed! this is a significantly reduced version:

Note that I’ve just found a combination that works:

  • use only allocatables
  • do not finalize structures with an allocatable recursive component:
! newly found gfortran ICE
module crash
   type :: t
      type(t), allocatable :: x
      contains
      final :: t_final ! no crash if no final
   end type
   contains
   subroutine t_final(this)
       type(t), intent(inout) :: this
   end subroutine t_final
end module crash
program p
   use crash
end          

… and here is a neat implementation of the <int,int> unordered map with all pure and `elemental procedures :slight_smile:

1 Like

When you have derived type with pointer compoments and a finalizer, you likely need to overload assignment too, depending in which way you initialize the DT. This is related to the so-called “rule of three”: Rule of three (C++ programming) - Wikipedia

The reason I say likely, is because it depends how you create an instance of that DT which could be either using an assignment or a call to a subroutine (could be type-bound).

If you need to use defined assignment and your type has pointer components, try to follow the advice given here: Should we avoid assignment of derived types in robust programs? - #35 by FortranFan


Edit: here is an example of how this issue manifests

module ill_t

use, intrinsic :: iso_c_binding
implicit none
private

public :: t
public :: t_of_size
public :: t_print

type :: t
    real(c_float), pointer :: a(:) => null()
contains
    final :: t_destroy
end type

interface
    function c_malloc(size) bind(c,name="malloc")
        import c_size_t, c_ptr
        integer(c_size_t), value :: size
        type(c_ptr) :: c_malloc
    end function
    subroutine c_free(ptr) bind(c,name="free")
        import c_ptr
        type(c_ptr), value :: ptr
    end subroutine
end interface

contains

    subroutine t_destroy(this)
        type(t), intent(inout) :: this
        if (associated(this%a)) then
            call c_free(c_loc(this%a))
            nullify(this%a)
        end if
    end subroutine

    function t_of_size(n) result(this)
        integer, intent(in) :: n
        type(t) :: this
        
        type(c_ptr) :: p
        integer :: i

        p = c_malloc(n*c_sizeof(1.0_c_float))
        if (c_associated(p)) then
            call c_f_pointer(p,this%a,[n])
        else
            return
        end if

        ! Initialize memory
        do i = 1, n
            this%a(i) = i
        end do

    end function

    subroutine t_print(tt)
        type(t), intent(in) :: tt
        if (associated(tt%a)) print *, tt%a 
    end subroutine

end module

program test

    use ill_t

    type(t) :: my_t

    my_t = t_of_size(5)  ! Seemingly okay 

       ! 1) left-hand side is finalized
       ! 2) right-hand side temporary created
       ! 3) intrinsic assignment (shallow copy pointer component)
       ! 4) right-hand side temporary is finalized
       ! 5) my_t has corrupt data member t%a (use-after-free)

    call t_print(my_t) ! expecting [ 1,2,3,4,5 ], instead garbage

end program

The reason this breaks is because the “lifetime management” of the object is inconsistent. We’ve specified the creation and destruction, but we haven’t specified the copy/assignment (rule of three). Removing the finalizer solves the problem, but then it’s up to the user to destroy the object (prevent a memory leak). The solution is to provide an assignment, but this has some subtle issues (does the assignment create a deep copy or a shallow copy?) making it is easy to break the encapsulation. This question is explored in the thread by @aradi -Should we avoid assignment of derived types in robust programs?

The next solution would be to replace the creator function with a creator subroutine:

call new_t_of_size(5,my_t)

This way we avoid inadvertently triggering the finalizer during creation.

IMO, this issue is addressed poorly in Fortran textbooks, the exception being Scientific Software Design by @rouson, Xia, and Xu. A couple of older references which describe the issue are,

I think we can form one more corollary with C++ core guidelines:

C.20: If you can avoid defining default operations, do

For Fortran I would change this to, “if you can avoid defining a finalizer and/or assignment operator, do”.

I agree with your guidelines on finalization and pointers and whether a non-experienced user should try to avoid it, but it’s not the issue I was trying to state here.

Regardless of how often the finalizer is called, if the final subroutine is correct (such as in this case), no issues should arise. With several of the latest gfortran versions instead, the finalizer is called with a wrong base address of the object, which corrupts the values of its components, such that pointers return associated instead of .not.associated (there are quite a few open issues/on-going work at GNU bugzilla already). When I’m at my desk I can post a version with all allocatables and no pointers, it suffers the same issue.

So my question was more tuned towards production software maintainers: are there (if any) strategies to work around these issues? or, is the only option to just avoid compiling a non-trivial Fortran code with some gfortran versions? There are platforms where this is not such an easy choice.

I do work with a production software and we do not compromise on the features the language offers. This being said, we do not rely on gfortran for the final release but ifort. gfortran is only used for analysis and cross compiler checking. The issues with finalizes and PDT are at the moment the main limitations to its usage.

1 Like

Just out of curiosity: would you like to share about practical use cases for parameterized derived types? I would love to use them more, but to me the showstopper is again the limited compiler support.

But there are PLENTY of cases where they would be so useful, even just a type(string(len=123)). However, I also find the lack of type-bound procedures annoying, although PDTs are not really made for inheritance, so I would be fine with also using a conventional interface / module proedure approach.

Sure,
here is a cases were PDT are used to create dynamic arrays of various types (fortiche/src/collections/src/list/intrinsic_list.f90 at 2be73be10fc76d26d65f6f0459e1954613b5ea38 · davidpfister/fortiche · GitHub). From my experience, they work fine when you use the kind parameter but are very buggy when you use the length parameter. PDT with kind could be useful for refactoring @amasaki203 unsigned integer library for instance.

Things start to get messy when you want to use assumed type PDT as dummy arguments. Every time I tried, gfortran could not resolve the procedure signature.

Even with compilers that correctly implements the length parameter, the performance do not seem to be there: Working with parameterised derived type containing array of parameterised derived type - #18 by nncarlson

I experienced it myself when I tried to write a version of Differentia with PDT rather than allocatables. The performances were disappointing to say the least.

1 Like

Most of the use cases I stumble upon are with the length parameters instead. For example recently I’ve been dealing with multi-precision arithmetic expansions that could be seamlessly defined as PDTs

function expansion_sum(a,b) result(c) 
   type(expansion(len=*)), intent(in) :: a,b
   type(expansion(len=a%n+b%n) :: c
end function 

The performance implications of having stack- rather than heap-allocated derived types, similar to what is done for array entities, would be huge! I don’t know of other languages that could do this so easily.

2 Likes