Best practices for passing C strings

I do not often have to do serious character processing or C interop, but today I found myself needing both. I want to wrap a C function which takes a char* argument. I came up with a solution that appears to work, but I am not entirely pleased with it. I wonder if there are features of ISO_C_BINDING that I am missing, or if my approach could be made more robust.

Taking for example the C function

// hello.c
#include <stdio.h>
void hello(char *name) { 
    printf("Hello, %s\n!", name); 
}

I write a Fortran interface using ISO_BIND_C and define a subroutine that wraps the C function in such a way that I can pass in a Fortran character variable.

! wrap_hello.f90
module wrap_hello
  implicit none

  private
  public :: hello

  interface
    subroutine c_hello(name) bind(C, name="hello")
      use iso_c_binding, only: c_char
      character(kind=c_char), dimension(*) :: name
    end subroutine c_hello
  end interface

contains

  subroutine hello(name)
    use iso_c_binding, only: c_char, c_null_char
    character(len=*), intent(in) :: name

    character(kind=c_char), dimension(len(name)+1) :: c_name
    integer :: i
    
    do i = 1, len(name)
      c_name(i) = name(i:i)
    end do
    c_name(len(name)+1) = c_null_char

    call c_hello(c_name)
  end subroutine hello

end module wrap_hello

If I use this module in a test program

! test.f90
program test
  use wrap_hello, only: hello
  implicit none
  character(len=100) :: name
  read *, name
  call hello(trim(name))
end program

It appears to behave as expected. However, there are a few aspects of my implementation that I do not like:

  1. The conversion from Fortran character variable to C character array is clunky. I feel there must be a better way than explicitly looping and then tacking on a NUL.
  2. It assumes that assigning a default-kind character to a character(c_char) “just works”. In GCC (and probably most compilers), the default character kind and c_char coincide. Is that mandated by the standard, or am I asking for trouble?

Toward issue #2, when doing C-interop with numerical codes, I usually satisfy myself by doing explicit type conversions to make sure the kinds are exactly right on both the Fortran and C ends. But as far as I know, there’s no analogue for conversion of one character kind to another. Is this paranoia, or is it a real concern if robustness is a priority?

In playing around, I found that instead of having the local variable c_name, I can get away with just doing call c_hello(name//c_null_char), but that seems really fast and loose. It solves issue #1 at the cost of increasing my paranoia toward issue #2.

For completeness, I compiled the example with GCC 9.3 as

$ gcc-9 -c -Wall -Wextra -fsanitize=address -o hello.o hello.c
$ gfortran-9 -c -Wall -Wextra -std=f2018 -pedantic -fsanitize=address -fcheck=all -o hello_wrap.o hello_wrap.f90
$ gfortran-9 -c -Wall -Wextra -std=f2018 -pedantic -fsanitize=address -fcheck=all -o test.o test.f90
$ gfortran-9 -o test hello.o hello_wrap.o test.o -lasan
2 Likes

I’ve typically just gone with call c_hello(name//c_null_char). For the compilers I use (gfortran and ifort) it seems the default character kind and c_char are the same.

An intrinsic function do to the Fortran to C character conversion routine is on the proposal list for Fortran 202X (see point US09). The actual proposal is here.

4 Likes

As far as I understand, it’s correct language use to pass a Fortran character string as an assumed-size character array actual argument. I do this in tcp-client-server. Walt Breinerd’s Guide to Fortran 2008 Programming also does this.

Out of curiosity I added the warning and conformance flags and it builds without warning. (gcc-9 has an issue with dns.c in libdill, so you need either gcc-8 or clang).

What does -fsanitize=address do? I can’t find any documentation for it.

Welcome @nshaffer,
in the gtk-fortran project, there is some functions to convert strings C<->Fortran. See for example the second half of this file:
https://github.com/vmagnin/gtk-fortran/blob/gtk4/src/gtk-sup.f90
But I am not the author of those functions and I don’t pretend to understand perfectly that code. And perhaps some clean-up should be done…

-fsanitize is explained here:

-fsanitize=address
Enable AddressSanitizer, a fast memory error detector. Memory access instructions are instrumented to detect out-of-bounds and use-after-free bugs.

Note: be careful when searching for options on the web. Don’t forget to remove the minus character or use double quotes (“-fsanitize”), else the search engine will filter out the pages with fsanitize… :woozy_face:

4 Likes

Thank you, this is exactly why I couldn’t find it. In my 20 years of Googling programming solutions I didn’t know about this “trick”. :slight_smile:

1 Like

I never knew you could do this - I usually have the clunky assignment loop.
ifort (19.1) compiles without complaint.

I think this excerpt from the F2018 interpretation document (18.3.5) might be the relevant explanation:

Fortran’s rules of sequence association (15.5.2.11) permit a character scalar actual argument to correspond to a dummy argument array. This makes it possible to argument associate a Fortran character string with a C string.

3 Likes

As noted, you can pass a character value of any length to an interoperable character array - this carve-out in the language is explicitly there for this situation. I would caution that you probably want trim(name)//c_null_char if you don’t want the trailing blanks.

@ivanpribec already pointed you to the proposal for F202X to help going the other way (C to Fortran).

5 Likes

Thanks for the responses, all! It is good to see that sequence association saves the day here for going from F --> C. And thanks to @ivanpribec for the link to the F202x proposal for facilitating C --> F.

In the standard excerpt that @lkedward quoted, I take it that sequence association used in this way still requires that the Fortran character variable be of kind c_char or an equivalent. I wonder if there are any machines out there (running, not in museums) where default character kind and c_char are not the same?

Speaking of f_c_string, it should do the following:

function f_c_string(string,trim)
  use, intrinsic :: ISO_C_BINDING, only: c_char, c_null_char
  implicit none
  character(len=*), intent(in) :: string
  logical, intent(in), optional :: trim

  character(kind=c_char,len=:), allocatable :: f_c_string
  logical :: trim_

  trim_ = .false.
  if (present(trim)) trim_ = trim

  if (trim_) then
    f_c_string = trim(string)//c_null_char
  else
    f_c_string = string//c_null_char
  end if
end function

This however doesn’t work since the optional trim argument is shadowing the intrinsic function trim. Is there any way to overcome this while preserving the trim keyword in the interface?

It would be nice for such situations to be able to rename the intrinsics upon import , e.g. use, intrinsic, only: intrinsic_trim => trim.

1 Like

Steve’s recent blog post might be relevant here regarding the intrinsic keyword.

Would this modification work perhaps?

block
    intrinsic trim

    if (trim_) then
        f_c_string = trim(string)//c_null_char
    else
        f_c_string = string//c_null_char
    end if

end block 
1 Like

Indeed, that works! Great trick.

Edit: in case someone wants to use this, here is the full version:
Edit 2: the default setting is supposed to be trim = .true..

function f_c_string(string,trim)
  use, intrinsic :: iso_c_binding, only: c_char,c_null_char
  implicit none
  character(len=*), intent(in) :: string
  logical, intent(in), optional :: trim

  character(kind=c_char,len=:), allocatable :: f_c_string
  logical :: trim_

  trim_ = .true.
  if (present(trim)) trim_ = trim

  block
    intrinsic trim
    if (trim_) then
      f_c_string = trim(string)//c_null_char
    else
      f_c_string = string//c_null_char
    end if
  end block
end function
2 Likes

What is the reason for a trim argument at all? If I want to trim trailing blanks before passing off to C, why not ask me to just pass in trim(string)?

The way I understand it, it is just convenience. According to the proposal the default behavior is to trim, giving the clean syntax f_c_string(string). If you don’t want to trim, you would do f_c_string(string,trim=.false.). I leave it to @sblionel to correct me if I am wrong.

Edit: corrected the name to f_c_string.

The latest proposal is https://j3-fortran.org/doc/year/19/19-197r3.txt with C_F_STRPOINTER and F_C_STRING (not C_F_STRING).

F_C_STRING (STRING [, ASIS])

Description: Copy a string with appended NUL

Class: Transformational function

STRING shall be a character scalar of default kind or of kind C_CHAR.

ASIS (optional) shall be a logical scalar.

Result characteristics: The result is of type character with
the same kind type parameter as STRING.

Result value:

Case (i): If ASIS is not present or has the value false, the value
of the result is the value of TRIM(STRING)// CHAR(0,KIND(STRING)).
The length type parameter is one greater than the length
of STRING less the number of trailing blanks in STRING.

Case (ii): If ASIS is present with the value true, the
value of the result is the value of STRING with CHAR(0,KIND(STRING))
appended. The length type parameter of the result is one greater
than the length of STRING.

2 Likes

Oops, I’ve corrected my previous post to show the correct name. What does asis stand for?

Edit: nevermind, I got it - “as is”.

Sorry for hijacking this post, but my question seems to be related, rather just in the opposite direction. Specifically, I want to pass a string from C to Fortran without an argument with the string length.

I principle, this should be possible by calling strlen from Fortran. My code is as shown below. I must be doing something really silly because the code stops on call c_f_pointer without throwing any error.

module fmodule
   use, intrinsic :: iso_c_binding, only: c_int, c_char, c_ptr, c_f_pointer
   implicit none
   private

   public :: open_file

   interface
      integer(c_int) function strlen(string) bind(C)
         import :: c_int, c_ptr
         type(c_ptr), intent(in) :: string ! value?
      end function
   end interface


contains

   subroutine open_file(unit, filename)
      integer, intent(in) :: unit
      character(len=*), intent(in) :: filename

      print *, "filename=", filename
      print *, "length  =", len(filename)
      print *, "unit    =", unit

      open (unit=unit, file=filename)
      close (unit)

   end subroutine


   subroutine open_file_c(unit, filename_cptr) bind(C)
      integer(c_int), intent(in), value :: unit
      type(c_ptr) :: filename_cptr

      integer :: i
      integer :: length
      character(kind=c_char), pointer :: filename_fptr(:)
      character(len=:), allocatable :: filename
      
      length = strlen(filename_cptr)
      print *, "length=", length

      allocate(filename_fptr(length))
      print *, "associated(filename_fptr)=", associated(filename_fptr)
      ! this point is reached...
      call c_f_pointer(cptr=filename_cptr, fptr=filename_fptr, shape=[length])
      ! this point is NOT reached...
      print *, "filename_fptr=", associated(filename_fptr), filename_fptr

      ! Convert character array to character string _scalar_
      allocate(character(len=length) :: filename)
      do i = 1, length
         filename(i:i) = filename_fptr(i)
      end do

      call open_file(unit, filename)

   end subroutine

end module fmodule
#include <string.h>
#include "fmodule.h"

int main()
{
    char *filename = "hello_C1.txt";

    int unit = 42;

    open_file_c(unit, filename);

    return 0;
}

Full code here.

Don’t know if this helps but this is what I wrote for my netCDF C-interop interfaces (about 15 years ago now). Its been part of netCDF since about 2013.

C prototype

const char * nc_inq_libvers(void);


! C interop interface

Interface
 Function nc_inq_libvers() BIND(C)

 USE ISO_C_BINDING, ONLY: C_PTR

 Type(C_PTR) :: nc_inq_libvers

 End Function nc_inq_libvers
End Interface

! Fortran function

 Function nf_inq_libvers() RESULT(vermsg)

! Return string with current version of NetCDF library

 USE netcdf_nc_interfaces

 Implicit NONE

 Character(LEN=80) :: vermsg

 Character(LEN=81), Pointer :: fstrptr
 TYPE(C_PTR)                :: cstrptr
 Integer                    :: inull, ilen

 vermsg = REPEAT(" ", LEN(vermsg)) !initialize vermsg to blanks

! Get C character pointer returned by nc_inq_vers and associate it
! Fortran character pointer (fstrptr). Have to do this when the C
! routine allocates space for the pointer and/or knows where it lives
! not Fortran. This is also how you can pass character data back to
! FORTRAN from C using a C function that returns a character pointer
! instead using a void jacket function and passing the string as a hidden 
! argument. At least this is how cfortran.h appears to do it. 


 NULLIFY(fstrptr) ! Nullify fstrptr

 cstrptr = nc_inq_libvers()         ! Get C pointer to version string and

 Call C_F_POINTER(cstrptr, fstrptr) ! associate it with FORTRAN pointer

! Locate first C null character and then set it and remaining characters
! in string to blanks

 ilen  = LEN_TRIM(fstrptr)
 inull = SCAN(fstrptr,C_NULL_CHAR)
 If (inull /= 0) ilen = inull-1
 ilen = MAX(1, MIN(ilen,80)) ! Limit ilen to >=1 and <=80

! Load return value with trimmed fstrptr string

 vermsg(1:ilen) = fstrptr(1:ilen)

 End Function nf_inq_libvers

Edit:

The actual netCDF C function looks like this

/* Tell the user the version of netCDF. */
static const char nc_libvers[] = PACKAGE_VERSION " of "__DATE__" "__TIME__" $";

/**
\defgroup lib_version Library Version
  Functions related to querying the library version.

  Return the library version.

  \returns short string that contains the version information for the
library.
 */
const char *
nc_inq_libvers(void)
{
   return nc_libvers;
}


your pointer argument is being passed by reference not by value. Your current interface is equivalent to

void open_file_c(int unit, char** filename);

which is the reason it does not work. What you want is Fortran to receive the value of that pointer:

void open_file_c(int unit, char* filename);

i.e.

! Option #1: use C pointer
subroutine open_file_c(unit, filename_cptr) bind(C)
      integer(c_int), intent(in), value :: unit
      type(c_ptr), value :: filename_cptr
end subroutine

! Option #2: use datatype directly: your pointer is the reference to it
subroutine open_file_c2(unit, filename) bind(C)
    integer(c_int), intent(in), value :: unit
    character(kind=c_char), intent(in) :: filename(*)
end subroutine
2 Likes

Thank you very much, @FedericoPerini. It was exactly that!