Binding multiple C functions under one Fortran interface

I have a C function coming in different “flavors”:

void form_i2(const char * label, int16_t i2);
void form_i4(const char * label, int32_t i4);
// etc.

I was thinking of defining an interface to dispatch automatically to the correct form:

interface form
  subroutine form_i2(label, i2)
    import :: c_char, c_int16_t
    implicit none
    character(c_char), dimension(*) :: label
    integer(c_int16_t), value :: i2
  end
  subroutine form_i4(label, i4)
    import :: c_char, c_int32_t
    implicit none
    character(c_char), dimension(*) :: label
    integer(c_int32_t), value :: i4   
  end
  ! etc.
end interface

Calling the subroutine directly works well, but calling the interface triggers an error with gfortran 14 and ifx 2024.0.0 as well:

character(len=64) :: label
integer my_int

label = "My Label 1" // c_null_char

call form_i4(label, my_int)
call form(label, my_int) ! There is no specific subroutine for the generic 'form'

Digging a little, I found that:

  • Changing character(len=64) :: label to character, dimension(64) :: label compiles fine, but the assignement label = "My Label 1" doesn’t work anymore;
  • Changing the interface to:
  subroutine form_i4(label, i4)
    import :: c_ptr, c_int32_t
    implicit none
    type(c_ptr), value :: label
    integer(c_int32_t), value :: i4   
  end

And the call to:

character(len=64), target :: label
integer my_int

label = "My Label 1" // c_null_char

call form_i4(c_loc(label), my_int)
call form(c_loc(label), my_int)

works fine, the compilers do find an implementation matching the arguments in the interface defined. For the caller it’s less practical however, with an extra target and c_loc to add.

But why isn’t it working with character(c_char), dimension(*) arguments?

Below is a minimal reproducible example, with the integer argument stripped for simplicity:

Minimal reproducible example
program test_c_string

use iso_c_binding
implicit none

interface form_interface

    subroutine form1(label) bind(c)
      import :: c_char, c_double, c_float, c_int
      implicit none
      character(c_char), dimension(*) :: label        
    end subroutine

    subroutine form2(label) bind(c, name="form1")
      import :: c_ptr
      implicit none
      type(c_ptr), value :: label        
    end subroutine

end interface

character(len=64) :: label
character(len=64), target :: label_t
character, dimension(64) :: label_dim

label = "My Label 1" // c_null_char
label_t = "My Label 1 with target" // c_null_char
label_dim = "My Label 1 with dimension()" // c_null_char

!call form_interface(label) ! no specific subroutine
call form1(label)

call form_interface(c_loc(label_t))
call form2(c_loc(label_t))

call form_interface(label_dim)
call form1(label_dim)

end program

And a C “implementation”:

#include <stdio.h>

void form1(const char * string)
{
    printf("Input C String = <%s>\n", string);
}

Running this should produce something like:

Input C String = <My Label 1>
Input C String = <My Label 1 with target>
Input C String = <My Label 1 with target>
Input C String = <MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMy Label 1>
Input C String = <MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMy Label 1>

Probably because a character, dimension(*) means an array of single characters, while character(len=64) means a scalar made of 64 characters. The representation in memory has great chances to be identical, but the concepts at the language level are different.

1 Like

I think your interface is just OK and there will never be problems, but I noticed that you’re referring to a const char* pointer without an intent, I believe the exact translation would be

character(c_char), dimension(*), intent(in) :: label

Regarding string → C array conversion, stdlib provides a convenience function: you could just do

use stdlib_strings, only: to_c_char

call form(to_c_char(label))

(label won’t even have to be c_null_char-terminated)

1 Like

The Fortran 2023 standard added F_C_STRING which appends the null, but not all compilers support it yet.

I would also note that the standard has a special sequence association rule when it comes to characters (J3/24-007, 15.5.2.12):

Sequence association only applies when the dummy argument is an explicit-shape or assumed-size array. The rest of this subclause only applies in that case.

An actual argument represents an element sequence if it is an array expression, an array element designator, default character scalar, or a scalar of type character with the C character kind (18.2.2). [emphasis added]

[…]

If the dummy argument is of type character with default or C character kind, and has nonzero character length, the storage unit sequence is as follows:

  • if the actual argument is an array expression, the storage units of the array;
  • if the actual argument is an array element or array element substring designator of a simply contiguous array, the storage units starting from the first storage unit of the designator and continuing to the end of the array;
  • otherwise, if the actual argument is scalar, the storage units of the scalar object

What this rule means is you can pass a scalar string (or array of strings) to a character assumed-size array dummy argument. So this would also work:

use, intrinsic :: iso_c_binding, only: c_char, c_null_char

character(kind=c_char,len=40) :: label

call form( label//c_null_char )  ! sequence association

From the output it seems like you are doing an array assignment, specifically assigning the letter M 64 times?

The reason you get the output,

Input C String = <MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMy Label 1>

that is 64 M characters followed by My Label 1 is because it appears the compiler happened to place the variable label_dim in memory before label or label_t, so the printf actually read past the memory of that variable and stopped at the null terminator of the next variable. In other words you have an out-of-bounds violation here.

To assign the array, you need to either do it using one of the following approaches:

label_dim = ['M','y',' ','l', 'a','b','e','l', c_null_char]
label_dim = transfer('My label'//c_null_char, mold=label_dim)
1 Like

Well, if form is an interface, it does not:

program test_c_string

use, intrinsic :: iso_c_binding, only: c_char, c_null_char
implicit none

interface form

    subroutine form1(label) bind(c)
      import :: c_char
      implicit none
      character(c_char), dimension(*) :: label        
    end subroutine

end interface

character(kind=c_char,len=40) :: label

call form( label//c_null_char )  ! sequence association

end program

This gives:

test_c_string_2.f90:18:55:

   18 | call form( label//c_null_char )  ! sequence association
      |                                                       1
Error: There is no specific subroutine for the generic 'form' at (1)

If form is a subroutine within the interface, it’s OK.

1 Like

I see the issues now. It looks like the sequence association doesn’t work within the generic overloading. :confused:

Btw, here is a standalone example that doesn’t require C:


subroutine form1(label) bind(c)
    use, intrinsic :: iso_c_binding
    implicit none
    character(c_char), dimension(*) :: label

    interface
        function c_strlen(str) bind(c,name="strlen")
            import c_char, c_size_t
            character(c_char), intent(in) :: str(*)
            integer(c_size_t) :: c_strlen
        end function
    end interface

    associate(n => c_strlen(label))
        print *, "label = ", label(1:n)
    end associate

end subroutine

program test_c_string

use, intrinsic :: iso_c_binding, only: c_char, c_null_char
implicit none

interface form
    subroutine form1(label) bind(c)
      import :: c_char
      implicit none
      character(kind=c_char,len=1), dimension(*) :: label        
    end subroutine
end interface

character(kind=c_char,len=40) :: label

label = "abracadabra"

!call form(trim(label)//c_null_char )  ! generic resolution fails
call form1(trim(label)//c_null_char)  ! sequence association applies via specific name

end program

If I switch on the commented statement, all compilers I tested give an error similar to:

~/dispatch_test> nagfor test_string.f90 
NAG Fortran Compiler Release 7.2(Shin-Urayasu) Build 7203
Error: test_string.f90, line 38: No specific match for reference to generic FORM
[NAG Fortran Compiler error termination, 1 error]
1 Like

Yes, this is precisely the problem. Thank you for clarifying its statement.

It looks like generic procedure resolution is strict in terms of satisfying the TKR rule:

A dummy argument is type, kind, and rank compatible, or TKR compatible, with another dummy argument if the first is type compatible with the second, the kind type parameters of the first have the same values as the corresponding kind type parameters of the second, and both have the same rank or either is assumed-rank.

This rule is violated when you call the routine via the generic name form instead of the specific name form1.

Besides the solution of @FedericoPerini which creates a copy of the string, you could also do the following:

character(kind=c_char,len=40), target :: label

label = "abracadabra"//c_null_char
call form1(label)  ! sequence association

block
    character(c_char), pointer :: p_label(:) => null()

    ! Fortran len_trim() includes the termination character
    ! C strlen() does not include the termination character

    call c_f_pointer(c_loc(label),p_label,[len_trim(label)])
    call form( p_label )  ! generic resolution to form1
end block
2 Likes

One more idea would be the use of F2018 C descriptors, i.e.

// form.c
#include <stdio.h>
#include <ISO_Fortran_binding.h>

// C routines (n.b. the used of string buffers with an explicit length)
void form_i2(const char * label, int llen, int16_t i2) {
    printf("form_i2: Input C String = <%.*s>\n", llen, label);
}
void form_i4(const char * label, int len, int32_t i4) {
    printf("form_i4: Input C String = <%.*s>\n", llen, label);
}

// Wrapper routines for Fortran callers
void f_form_i2(CFI_cdesc_t* label, int16_t i2) {
    form_i2((void *) label->base_addr, (int) label->elem_len, i2);
}
void f_form_i4(CFI_cdesc_t* label, int16_t i4) {
    form_i4((void *) label->base_addr, (int) label->elem_len, i4);
}
! test.f90
program test
use, intrinsic :: iso_c_binding
implicit none
interface form
    subroutine f_form_i2(label,i) bind(c)
        import c_char, c_int16_t
        character(kind=c_char,len=*), intent(in) :: label
        integer(c_int16_t), value :: i
    end subroutine

    subroutine f_form_i4(label,i) bind(c)
        import c_char, c_int32_t
        character(kind=c_char,len=*), intent(in) :: label
        integer(c_int32_t), value :: i
    end subroutine
end interface

call form("Some label", 16_c_int16_t)
call form("Some longer label", 32_c_int32_t)

end program
$ gcc-14 -Wall -c form.c 
$ gfortran-14 -Wall -o test test.f90 form.o
$ ./test 
form_i2: Input C String = <Some label>
form_i4: Input C String = <Some longer label>

Note that the ISO Fortran binding header is specific to the co-processor (i.e. each Fortran/C compiler pair has its own header), in other words it only provides source-level compatibility.

This explanation makes senses, thanks for pointing to the relevant part of the standard (which I’m not yet as familiar with as I should).

I’m considering using the C descriptors indeed, precisely because the string size is passed, so the caller does not have to concatenate a c_null_char manually (my goal is to make the Fortran side as simple as possible).

In that case you could also just make a Fortran wrapper,

subroutine form2(label,i2)
   character(len=*), intent(in) :: label  ! scalar string
   interface
      ! The actual C routine
      subroutine form_i2(...) bind(c)
          character(c_char) :: label(*)  ! array of chars
      end subroutine
   end interface
   call form_i2(label//c_null_char,i2)  ! sequence association
              ! ^copy due to the string concatenation
end subroutine

and then provide a generic interface to the Fortran wrappers,

interface form
   procedure :: form2, ...
end interface

This way you avoid the clash between sequence association and generic overloading.

Bottom-line is you need to append the null-termination character somewhere (either in C or Fortran), as long as the C library expects it that way.

1 Like