Best practices for passing C strings

I do not often have to do serious character processing or C interop, but today I found myself needing both. I want to wrap a C function which takes a char* argument. I came up with a solution that appears to work, but I am not entirely pleased with it. I wonder if there are features of ISO_C_BINDING that I am missing, or if my approach could be made more robust.

Taking for example the C function

// hello.c
#include <stdio.h>
void hello(char *name) { 
    printf("Hello, %s\n!", name); 
}

I write a Fortran interface using ISO_BIND_C and define a subroutine that wraps the C function in such a way that I can pass in a Fortran character variable.

! wrap_hello.f90
module wrap_hello
  implicit none

  private
  public :: hello

  interface
    subroutine c_hello(name) bind(C, name="hello")
      use iso_c_binding, only: c_char
      character(kind=c_char), dimension(*) :: name
    end subroutine c_hello
  end interface

contains

  subroutine hello(name)
    use iso_c_binding, only: c_char, c_null_char
    character(len=*), intent(in) :: name

    character(kind=c_char), dimension(len(name)+1) :: c_name
    integer :: i
    
    do i = 1, len(name)
      c_name(i) = name(i:i)
    end do
    c_name(len(name)+1) = c_null_char

    call c_hello(c_name)
  end subroutine hello

end module wrap_hello

If I use this module in a test program

! test.f90
program test
  use wrap_hello, only: hello
  implicit none
  character(len=100) :: name
  read *, name
  call hello(trim(name))
end program

It appears to behave as expected. However, there are a few aspects of my implementation that I do not like:

  1. The conversion from Fortran character variable to C character array is clunky. I feel there must be a better way than explicitly looping and then tacking on a NUL.
  2. It assumes that assigning a default-kind character to a character(c_char) “just works”. In GCC (and probably most compilers), the default character kind and c_char coincide. Is that mandated by the standard, or am I asking for trouble?

Toward issue #2, when doing C-interop with numerical codes, I usually satisfy myself by doing explicit type conversions to make sure the kinds are exactly right on both the Fortran and C ends. But as far as I know, there’s no analogue for conversion of one character kind to another. Is this paranoia, or is it a real concern if robustness is a priority?

In playing around, I found that instead of having the local variable c_name, I can get away with just doing call c_hello(name//c_null_char), but that seems really fast and loose. It solves issue #1 at the cost of increasing my paranoia toward issue #2.

For completeness, I compiled the example with GCC 9.3 as

$ gcc-9 -c -Wall -Wextra -fsanitize=address -o hello.o hello.c
$ gfortran-9 -c -Wall -Wextra -std=f2018 -pedantic -fsanitize=address -fcheck=all -o hello_wrap.o hello_wrap.f90
$ gfortran-9 -c -Wall -Wextra -std=f2018 -pedantic -fsanitize=address -fcheck=all -o test.o test.f90
$ gfortran-9 -o test hello.o hello_wrap.o test.o -lasan
2 Likes

I’ve typically just gone with call c_hello(name//c_null_char). For the compilers I use (gfortran and ifort) it seems the default character kind and c_char are the same.

An intrinsic function do to the Fortran to C character conversion routine is on the proposal list for Fortran 202X (see point US09). The actual proposal is here.

4 Likes

As far as I understand, it’s correct language use to pass a Fortran character string as an assumed-size character array actual argument. I do this in tcp-client-server. Walt Breinerd’s Guide to Fortran 2008 Programming also does this.

Out of curiosity I added the warning and conformance flags and it builds without warning. (gcc-9 has an issue with dns.c in libdill, so you need either gcc-8 or clang).

What does -fsanitize=address do? I can’t find any documentation for it.

Welcome @nshaffer,
in the gtk-fortran project, there is some functions to convert strings C<->Fortran. See for example the second half of this file:
https://github.com/vmagnin/gtk-fortran/blob/gtk4/src/gtk-sup.f90
But I am not the author of those functions and I don’t pretend to understand perfectly that code. And perhaps some clean-up should be done…

-fsanitize is explained here:

-fsanitize=address
Enable AddressSanitizer, a fast memory error detector. Memory access instructions are instrumented to detect out-of-bounds and use-after-free bugs.

Note: be careful when searching for options on the web. Don’t forget to remove the minus character or use double quotes (“-fsanitize”), else the search engine will filter out the pages with fsanitize… :woozy_face:

4 Likes

Thank you, this is exactly why I couldn’t find it. In my 20 years of Googling programming solutions I didn’t know about this “trick”. :slight_smile:

1 Like

I never knew you could do this - I usually have the clunky assignment loop.
ifort (19.1) compiles without complaint.

I think this excerpt from the F2018 interpretation document (18.3.5) might be the relevant explanation:

Fortran’s rules of sequence association (15.5.2.11) permit a character scalar actual argument to correspond to a dummy argument array. This makes it possible to argument associate a Fortran character string with a C string.

3 Likes

As noted, you can pass a character value of any length to an interoperable character array - this carve-out in the language is explicitly there for this situation. I would caution that you probably want trim(name)//c_null_char if you don’t want the trailing blanks.

@ivanpribec already pointed you to the proposal for F202X to help going the other way (C to Fortran).

5 Likes

Thanks for the responses, all! It is good to see that sequence association saves the day here for going from F --> C. And thanks to @ivanpribec for the link to the F202x proposal for facilitating C --> F.

In the standard excerpt that @lkedward quoted, I take it that sequence association used in this way still requires that the Fortran character variable be of kind c_char or an equivalent. I wonder if there are any machines out there (running, not in museums) where default character kind and c_char are not the same?

Speaking of f_c_string, it should do the following:

function f_c_string(string,trim)
  use, intrinsic :: ISO_C_BINDING, only: c_char, c_null_char
  implicit none
  character(len=*), intent(in) :: string
  logical, intent(in), optional :: trim

  character(kind=c_char,len=:), allocatable :: f_c_string
  logical :: trim_

  trim_ = .false.
  if (present(trim)) trim_ = trim

  if (trim_) then
    f_c_string = trim(string)//c_null_char
  else
    f_c_string = string//c_null_char
  end if
end function

This however doesn’t work since the optional trim argument is shadowing the intrinsic function trim. Is there any way to overcome this while preserving the trim keyword in the interface?

It would be nice for such situations to be able to rename the intrinsics upon import , e.g. use, intrinsic, only: intrinsic_trim => trim.

1 Like

Steve’s recent blog post might be relevant here regarding the intrinsic keyword.

Would this modification work perhaps?

block
    intrinsic trim

    if (trim_) then
        f_c_string = trim(string)//c_null_char
    else
        f_c_string = string//c_null_char
    end if

end block 
1 Like

Indeed, that works! Great trick.

Edit: in case someone wants to use this, here is the full version:
Edit 2: the default setting is supposed to be trim = .true..

function f_c_string(string,trim)
  use, intrinsic :: iso_c_binding, only: c_char,c_null_char
  implicit none
  character(len=*), intent(in) :: string
  logical, intent(in), optional :: trim

  character(kind=c_char,len=:), allocatable :: f_c_string
  logical :: trim_

  trim_ = .true.
  if (present(trim)) trim_ = trim

  block
    intrinsic trim
    if (trim_) then
      f_c_string = trim(string)//c_null_char
    else
      f_c_string = string//c_null_char
    end if
  end block
end function
2 Likes

What is the reason for a trim argument at all? If I want to trim trailing blanks before passing off to C, why not ask me to just pass in trim(string)?

The way I understand it, it is just convenience. According to the proposal the default behavior is to trim, giving the clean syntax f_c_string(string). If you don’t want to trim, you would do f_c_string(string,trim=.false.). I leave it to @sblionel to correct me if I am wrong.

Edit: corrected the name to f_c_string.

The latest proposal is https://j3-fortran.org/doc/year/19/19-197r3.txt with C_F_STRPOINTER and F_C_STRING (not C_F_STRING).

F_C_STRING (STRING [, ASIS])

Description: Copy a string with appended NUL

Class: Transformational function

STRING shall be a character scalar of default kind or of kind C_CHAR.

ASIS (optional) shall be a logical scalar.

Result characteristics: The result is of type character with
the same kind type parameter as STRING.

Result value:

Case (i): If ASIS is not present or has the value false, the value
of the result is the value of TRIM(STRING)// CHAR(0,KIND(STRING)).
The length type parameter is one greater than the length
of STRING less the number of trailing blanks in STRING.

Case (ii): If ASIS is present with the value true, the
value of the result is the value of STRING with CHAR(0,KIND(STRING))
appended. The length type parameter of the result is one greater
than the length of STRING.

2 Likes

Oops, I’ve corrected my previous post to show the correct name. What does asis stand for?

Edit: nevermind, I got it - “as is”.