Manipulating array descriptors from C?

Hello,

For illustration purpose I wanted to change the lower bound of an already allocated array, by using the C interoperabily and by directly modifying the array descriptor from C (I know we are not supposed to do that, but it is also supposed to work).

It works as expected if I pass directly the allocatable array to a C routine.

But for some reason, on the C side I want to encapsulate the array descriptor in another structure, and on the Fortran side manipulate only an opaque handle to this structure. But then it doesn’t work and I don’t get why… On the C side I can check that the address of the array descriptor is the right one, so what goes wrong?

I’ve made a MRE:

The Fortran part:

PROGRAM foo
USE iso_c_binding
implicit none
   
   INTERFACE 
   
      subroutine fcsetlb(a,lb) bind(C)
         import c_float, c_int
         real(c_float), allocatable, intent(inout) :: a(:)
         integer(c_int), intent(in) :: lb
      end subroutine
      
      subroutine fcbind(h,a) bind(C)
         import c_ptr, c_float
         type(c_ptr), intent(inout) :: h
         real(c_float), allocatable, intent(in) :: a(:)
      end subroutine
      
      subroutine fcsetlb_h(h,lb) bind(C)
         import c_ptr, c_int
         type(c_ptr), intent(inout) :: h
         integer(c_int), intent(in) :: lb
      end subroutine
      
   END INTERFACE
   
   real, allocatable :: a(:)
   type(c_ptr) :: h


   allocate( a(10) )
   
   print*, "=== SETTING THE LB DIRECTLY ==="
   call fcsetlb(a,5)
   print*, "F: visible lower bound =", lbound(a)
   
   print*, "=== SETTING THE LB THROUGH A HANDLE ==="
   call fcbind(h,a)   ! h is a handle to a C structure that encapsulates the array
   call fcsetlb_h(h,-5)
   print*, "F: visible lower bound =", lbound(a)

END

The C(++) part:

#include <cstdlib>
#include <cstdio>
#include <ISO_Fortran_binding.h>

// receives a pointer to the array descriptor and updates the LB
extern "C"   
void fcsetlb(CFI_cdesc_t* a, int* lb) {
    a->dim[0].lower_bound = *lb;
    printf(" C: address of the array descriptor = %p\n",a);
    printf(" C: lower bound component set to %d\n", a->dim[0].lower_bound);
}

// structure that encapsulates a pointer to the array descriptor
struct fooh {
    CFI_cdesc_t* a;
};

// allocates the structure, and binds the array descriptor to it 
extern "C"
void fcbind(fooh** h, CFI_cdesc_t* a)  {
    *h = (fooh*)malloc(sizeof(fooh));
    (*h)->a = a;
}
    
// receives a handle to the structure and updates the LB of the binded array descriptor
extern "C"   
void fcsetlb_h(fooh** h, int* lb) {
    (*h)->a->dim[0].lower_bound = *lb;
    printf(" C: address of the array descriptor = %p\n",(*h)->a);
    printf(" C: lower bound component set to %d\n", (*h)->a->dim[0].lower_bound);
}

The output:

$ g++ -c ccc.cpp && gfortran -c fff.f90 && g++ ccc.o fff.o -lgfortran && ./a.out 
 === SETTING THE LB DIRECTLY ===
 C: address of the array descriptor = 0x7ffee2110880
 C: lower bound component set to 5
 F: visible lower bound =           5
 === SETTING THE LB THROUGH A HANDLE ===
 C: address of the array descriptor = 0x7ffee2110880
 C: lower bound component set to -5
 F: visible lower bound =           5
iMac:FCVEC pierre$ 

The descriptor passed to fcbind is only valid for the duration of the call. If you compile your program with -fdump-tree-original and inspect the resulting file, you’ll see the problem:

    {
      void * cfi.7;    // <-- temporary descriptor created

      if ((real(kind=4)[0:] * restrict) a.data == 0B)
        {
          a.dtype = {.elem_len=4, .rank=1, .type=3};
        }
      a.span = (integer(kind=8)) a.dtype.elem_len;
      a.dtype.attribute = 1;
      cfi.7 = 0B;
      _gfortran_gfc_desc_to_cfi_desc (&cfi.7, &a);
      a.dtype.attribute = 1;
      fcbind (&h, cfi.7);
      _gfortran_cfi_desc_to_gfc_desc (&a, &cfi.7);
      __builtin_free (cfi.7);  // <-- destroys the descriptor stored in the C struct,
                               //     making it corrupt
    }
    {
      static integer(kind=4) C.3918 = -5;

      fcsetlb_h (&h, &C.3918);
    }

In the print statement afterward, the lbound() function uses the original gfortran descriptor of a. The handle however is corrupt.

See this answer:

You can store a copy of the base_addr in your struct, but you’re still at risk of your C struct becoming invalid due to a deallocate statement or the end of a scope, or the creation of an array temporary. Essentially you can’t have two sources of truth as one is bound to become corrupt at some point.

The first part triumphs over the second part. You shouldn’t attempt to modify the bounds of the descriptor other than with a call to CFI_allocate.

While your invocation of fsetlb works, it is technically a violation of the API. As a result it has corrupted even the internal gfortran descriptor.

Edit: adding the target attribute on a you can bypass the lifetime issue (on parts of the descriptor at least).

Does it means that the compiler is actually making a copy-in of the descriptor? If yes: 1) why? and 2) why not a copy-out?(ok I get it, if I’m storing the address of the copy, the copy-out won’t solve anything) I’m confused…

And even more confused after doing the same thing with a user-defined structure that is handled similarly to the array descriptor: then evrything works as expected:

PROGRAM foo
USE iso_c_binding
implicit none

   type, bind(c) :: bar
      integer(c_int) :: n=1
   end type
   
   INTERFACE 
   
      subroutine fcsetlb(a,b,lb) bind(C)
         import c_float, c_int, bar
         real(c_float), allocatable, intent(inout) :: a(:)
         type(bar), intent(inout) :: b
         integer(c_int), intent(in) :: lb
      end subroutine
      
      subroutine fcbind(h,a,b) bind(C)
         import c_ptr, c_float, bar
         type(c_ptr), intent(inout) :: h
         real(c_float), allocatable, intent(in), target :: a(:)
         type(bar), intent(in) :: b
      end subroutine
      
      subroutine fcsetlb_h(h,lb) bind(C)
         import c_ptr, c_int
         type(c_ptr), intent(inout) :: h
         integer(c_int), intent(in) :: lb
      end subroutine
      
   END INTERFACE
   
   real, allocatable, target :: a(:)
   type(bar) :: b
   type(c_ptr) :: h


   allocate( a(10) )
   
   print*, "=== SETTING THE LB DIRECTLY ==="
   call fcsetlb(a,b,5)
   print*, "F: visible lower bound =", lbound(a), b%n
   
   print*, "=== SETTING THE LB THROUGH A HANDLE ==="
   call fcbind(h,a,b)   ! h is a handle to a C structure that encapsulates the array
   call fcsetlb_h(h,-5)
   print*, "F: visible lower bound =", lbound(a), b%n

END
#include <cstdlib>
#include <cstdio>
#include <ISO_Fortran_binding.h>

struct bar {
    int n;
};

// receives a pointer to the array descriptor and updates the LB
extern "C"   
void fcsetlb(CFI_cdesc_t* a, bar* b, int* lb) {
    a->dim[0].lower_bound = *lb;
    printf(" C: address of the array descriptor = %p\n",a);
    printf(" C: lower bound component set to %d\n", a->dim[0].lower_bound);
    b->n = *lb;
}

// structure that encapsulates a pointer to the array descriptor
struct fooh {
    CFI_cdesc_t* a;
    bar* b;
};

// allocates the structure, and binds the array descriptor to it 
extern "C"
void fcbind(fooh** h, CFI_cdesc_t* a, bar* b)  {
    *h = (fooh*)malloc(sizeof(fooh));
    (*h)->a = a;
    (*h)->b = b;
}
    
// receives a handle to the structure and updates the LB of the binded array descriptor
extern "C"   
void fcsetlb_h(fooh** h, int* lb) {
    (*h)->a->dim[0].lower_bound = *lb;
    printf(" C: address of the array descriptor = %p\n",(*h)->a);
    printf(" C: lower bound component set to %d\n", (*h)->a->dim[0].lower_bound);
    (*h)->b->n = *lb;
}

Output:

iMac:FCVEC pierre$ g++ -c ccc.cpp && gfortran -c fff.f90 && g++ ccc.o fff.o -lgfortran && ./a.out 
 === SETTING THE LB DIRECTLY ===
 C: address of the array descriptor = 0x7ffee2cc2880
 C: lower bound component set to 5
 F: visible lower bound =           5           5
 === SETTING THE LB THROUGH A HANDLE ===
 C: address of the array descriptor = 0x7ffee2cc2880
 C: lower bound component set to -5
 F: visible lower bound =           5          -5

Does it mean that, in contrast to a bind(C) user defined type, the C array descriptor is NOT identical to the internal array descriptor of gfortran? And that it is created and filled upon call?

It seems that it doesn’t do (I’ve used it in the updated example).

The compiler is passing the descriptor to C by-reference, since that’s what the C function expects. Since the internal gfortran descriptor is different from the F2018 interoperable one, it has to create an instance on the fly. Imagine it more like this:

   print*, "=== SETTING THE LB DIRECTLY ==="
   block:
      type(c_ptr) :: cfi3
      cfi3 = __create_cfi_descriptor(a)
      call fcsetlb(cfi3,5)
      __update_internal_descriptor(a,cfi3)
   end block
   print*, "F: visible lower bound =", lbound(a)
   
   print*, "=== SETTING THE LB THROUGH A HANDLE ==="
   block:
      type(c_ptr) :: cfi7
      cfi7 = __create_cfi_descriptor(a)
      call fcbind(h,a) 
      __update_internal_descriptor(a,cfi7)
     ! cfi7 is destroyed at the end of this scope
   end block
   call fcsetlb_h(h,-5) ! pointer component h->a (the former cfi7) is invalid,
                        ! anything could happen !!!
   print*, "F: visible lower bound =", lbound(a)

The handle is storing an invalid pointer, which shouldn’t be de-referenced.

Short answer is YES and YES.

When you have an array of elements that are C-compatible instances of a derived type, passing the said array to a C routine, you are just passing the raw (virtual) base address of the array elements as C expects:

type, bind(c) :: point
   real(c_float) :: x, y
end type

interface
  ! struct point { float x, y };
  ! int maxpointloc(struct point *p, int n);
  integer(c_int) function maxpointloc(p, n) bind(c)
     type(point), intent(in) :: a(*)
     integer, intent(in), value :: n
  end function
end interface

Since most of the compilers existed before F2018 was a thing, I guess they mainly use their own internal representations for array meta-data. Hence they have to make a copy. Imagine the case where your Fortran processor is implemented in a non C-compatible language, so there is no possibility of re-using part of the representation. Instead you would work directly with the underlying x86 calling convention of your C co-processor, forcing you to repack the meta-data as needed.

I suspect this is different for new flang however, based on this post from one of the flang developers,

This header [ISO_Fortran_binding.h] defines one of the most important data structures in the project!

1 Like

OK, it explains everything, then… And it also explains why adding target doesn’t help, as the compiler has no way to avoid the pseudo copy-in/copy-out.

Maybe something like this could suit your needs?

! foo_descriptor_test.f90
!
subroutine move_alloc_wrap(from,to) bind(c)
use, intrinsic :: iso_c_binding, only: c_float
implicit none
real(c_float), intent(inout), allocatable :: from(:)
real(c_float), intent(out), allocatable :: to(:)
call move_alloc(from,to)
end subroutine

PROGRAM foo
USE iso_c_binding
implicit none
   
   INTERFACE 
      
      function vec_new(n,lb) bind(c)
         import c_int, c_ptr
         integer(c_int), intent(in), value :: n
         integer(c_int), intent(in), optional :: lb
         type(c_ptr) :: vec_new
      end function
      subroutine vec_pointer(v,a,lb) bind(c)
         import c_float, c_ptr, c_int
         type(c_ptr), value :: v
         real(c_float), pointer, intent(out) :: a(:)
         integer(c_int), intent(in), optional :: lb
      end subroutine
      subroutine vec_move(from,to) bind(c)
         import c_float, c_ptr
         type(c_ptr) :: from
         real(c_float), allocatable, intent(out) :: to(:)
      end subroutine

   END INTERFACE
   
   type(c_ptr), target :: h

   real, allocatable, target :: a(:)
   real, pointer :: b(:) => null()

   h = vec_new(5)  ! create handle to a custom C++ vector 

   print*, "=== GETTING A POINTER ARRAY ==="
   ! retrieve a pointer array with custom lower bounds 
   call vec_pointer(h, b, lb=-2)
   print*, "F: size(b) = ", size(b)
   print*, "F: lbound(b) = ", lbound(b)
   print*, "F: b = ", b

   print*, "=== MOVING INTO AN ALLOCATABLE ARRAY ==="
   ! move the vector into an allocatable array
   call vec_move(from=h,to=a)
   !h = c_null_ptr

   print*, "F: c_associated(h) = ", c_associated(h)
   print*, "F: size(a) = ", size(a)
   print*, "F: lbound(a) = ", lbound(a)
   print*, "F: a = ", a
   print*, "F: associated(b, a) = ", associated(b,a)    ! <-- suspicious
   print*, "F: same base address = ", loc(b) == loc(a)

   print*, "=== POINTER REMAPPING ==="
   b(-2:) => a
   print*, "F: associated(b, a) = ", associated(b, a)  ! suspicious
   print*, "F: same base address = ", loc(b) == loc(a)


END PROGRAM
// foo_c.cpp
//
#include <cstdlib>
#include <cstdio>
#include <cassert>
#include <ISO_Fortran_binding.h>
#include <numeric> // std::iota
#include <iostream>

// helper function wrapping Fortran's move_to
extern "C"
void move_alloc_wrap(CFI_cdesc_t *from, CFI_cdesc_t *to);

// A C++ container hosting a Fortran array descriptor
class vec {
public:

    // Construct new vector with size
    vec(int n, int lower_bound = 1) {
        int stat;
        stat = CFI_establish(
            this->desc(),
            NULL,
            CFI_attribute_allocatable,
            CFI_type_float,
            sizeof(float),
            1, NULL);
        assert(stat == CFI_SUCCESS);

        CFI_index_t lb[] = { lower_bound };
        CFI_index_t ub[] = { lower_bound + n - 1};

        stat = CFI_allocate(this->desc(), lb, ub, 
            0 /* ignored */);
        assert(stat == CFI_SUCCESS);

        // Initialize to values 1, 2, 3, ..., n
        float *begin = static_cast<float *>(this->desc()->base_addr);
        std::iota(begin, begin + n, 1);
    }

    // move the vector into an allocatable array b
    // the vector is destroyed in the process
    void move_to(CFI_cdesc_t *b) {
        move_alloc_wrap(this->desc(), b);
        assert(this->desc()->base_addr == nullptr);
    }

    inline CFI_cdesc_t *desc() const { return (CFI_cdesc_t *) &a_; } 

private:
    CFI_CDESC_T(1) a_; // lifetime of a_ is bound to the lifetime of vec
};

//
// The C wrappers
//

// returns a new vector handle
extern "C"
void *vec_new(int n, int *lower_bound) {
    vec *v = new vec(n, lower_bound ? *lower_bound : 1);
    return static_cast<void *>(v);
}

// associates the vector with a pointer array
extern "C"
void vec_pointer(void *v_, CFI_cdesc_t *a, int *lb_) {
    vec *v = static_cast<vec *>(v_);
    if (lb_) {
        // Custom lower bounds
        CFI_index_t lb[] = { *lb_ };
        CFI_setpointer(a,v->desc(),lb);
    } else {
        // Existing bounds
        CFI_setpointer(a,v->desc(),nullptr);
    }
}

// moves the vector into an allocatable array,
// the vector is destroyed in the process
extern "C"
void vec_move(void **from, CFI_cdesc_t *to) {
    vec *v = static_cast<vec *>(*from);
    v->move_to(to);
    delete v;
    *from = NULL;
}

Output:

$ g++-13 -Wall -g -std=c++11 -c foo_c.cpp
$ gfortran-13 -Wall -g foo_descriptor_test.f90 -o test foo_c.o -lstdc++
$ ./test
 === GETTING A POINTER ARRAY ===
 F: size(b) =            5
 F: lbound(b) =           -2
 F: b =    1.00000000       2.00000000       3.00000000       4.00000000       5.00000000    
 === MOVING INTO AN ALLOCATABLE ARRAY ===
 F: c_associated(h) =  F
 F: size(a) =            5
 F: lbound(a) =            1
 F: a =    1.00000000       2.00000000       3.00000000       4.00000000       5.00000000    
 F: associated(b, a) =  F
 F: same base address =  T
 === POINTER REMAPPING ===
 F: associated(b, a) =  T
 F: same base address =  T

I had to use gfortran 13, otherwise it didn’t work (presumably a bug during argument passing, leading to an invalid pointer).

Curiously, b although sharing the same data as a, returns it is is not associated. However, when we remap the array using the Fortran syntax it works. One could argue the round trip from Fortran to C++ just isn’t supported officially.

1 Like

The gfortran documentation mentions the conversion that has to take place:

Note: Currently, GNU Fortran does not use internally the array descriptor (dope vector) as specified in the Technical Specification, but uses an array descriptor with different fields in functions without the BIND(C) attribute. Arguments to functions marked BIND(C) are converted to the specified form.

1 Like