Passing strings to C(++) functions without copying

When making bind(c) interfaces that includes strings, the common advice is to make a copy of the string with c_null_char appended to its end. Something like this:

    pure function to_c_chars(fchars) result(cchars)
        use iso_c_binding, only: c_null_char, c_char
        character(len=*), intent(in) :: fchars
        character(len=1, kind=c_char), allocatable :: cchars(:)

        integer :: i
        integer :: n

        n = len(fchars)
        allocate(cchars(n + 1))
        do i = 1, n
            cchars(i) = fchars(i:i)
        end do
        cchars(n + 1) = c_null_char
    end function

For many applications this is probably ok because the amount of data copied usually isn’t that much.

I don’t like copying data for no good reason though so recently I’ve been testing an alternative approach and I think I found a neat little trick that will work in a subset of cases. See example below. The limitations are:

  • It’s mainly for read-only strings (though you can probably modify it if you’re careful)
  • A C++ compiler is required and the std::string_view class must be used. This has many of the same features as a regular std:string except for mutation though so you can still do quite a bit with it.

My question is: Can anyone see any situations where this will be invalid or do you think it is safe to use?

Fortran interface code (echo.f90):

module echo_mod
    use iso_c_binding, only: c_loc
    implicit none

    private
    public echo

    interface
        subroutine echo_c(chars, n) bind(c)
            use iso_c_binding, only: c_ptr, c_int
            type(c_ptr), value, intent(in) :: chars
            integer(c_int), intent(in) :: n
        end subroutine
    end interface

contains

    subroutine echo(chars)
        character(len=*), target, intent(in) :: chars

        write(*,*) 'Printing from C: ' // chars
        call echo_c(c_loc(chars), len(chars))
    end subroutine
end module

C++ code (echo.cpp):

#include <string_view>
#include <iostream>

extern "C" {

    void echo_c(const char* chars, const int* n)
    {
        auto view = std::string_view(chars).substr(0, *n);
        std::cout << view << std::endl;
    }
}

Usage (main.f90):

program main
    use echo_mod, only: echo
    implicit none

    call echo('Hello world')
end program

Output:

 Printing from C: Hello world
Hello world

I do not think it is necessary to use a C++ class. While most C functions presume the strings are terminated by a NUL byte, this is not set in stone. Actually, using counted strings seems to be much safer and as you are dealing with read-only strings, that is easier (no overflows to worry about). The above can be achieved with:

fprintf( "%*s\n", string_without_nul, number_chars );

The “*” informs the function to use the next argument as a length indication.

More or less similar functionality is achieved with functions such as strncpy() and the like.

1 Like

TL;DR

  • Let the “C descriptors” with the 2 processors of Fortran and the companion C one do most of the work for you i.e., seek Fortran 2018 facility to directly interoperate objects in Fortran wherever viable; this includes CHARACTER types with ALLOCATABLE (and POINTER) attributes.
  • Ensure interoperable types and this includes C_CHAR kind in Fortran objects of CHARACTER intrinsic type.

@plevold,

You may want to preview some of the suggestions I have long posted at this site (e.g., here) and/or Intel Fortran forum and comp.lang.fortran where I recommend further employment of the enhanced interoperability with C offered with Fortran starting the 2018 revision. Please note the ALLOCATABLE attribute in Fortran often comes in really handy, especially given Fortran’s rather nice safety features with this attribute when it comes to memory management.

Let us presume for a moment the “strings” you have in Fortran have the ALLOCATABLE attribute, a safer way to work with them given your attention in the original post with the “amount of data”. That is, as opposed to them being literal constants as your example shows.

In that case, you can even work with the deferred-shape option in Fortran and gain considerable brevity and simplicity with the interface:

#include <string_view>
#include <iostream>
#include "ISO_Fortran_binding.h"

extern "C" {
   void echo_c( const CFI_cdesc_t* Fstr )
   {
      auto view = std::string_view((char *)Fstr->base_addr).substr(0, Fstr->elem_len);
      std::cout << view << std::endl;
   }
}
   use, intrinsic :: iso_c_binding, only : c_char
   interface
      subroutine echo_c( str ) bind(C, name="echo_c" )
         import :: c_char
         ! Argument list
         character(kind=c_char, len=:), allocatable, intent(in) :: str
      end subroutine
   end interface
   character(kind=c_char, len=:), allocatable :: s
   s = c_char_"Hello World!"
   call echo_c( s )
end

Program behavior on WIndows using IFORT and Microsoft C/C++ companion processor:

C:\temp>cl /c /std:c++20 /EHsc /W3 c++.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30706 for x64
Copyright (C) Microsoft Corporation. All rights reserved.

c++.cpp
C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\compiler\include\ISO_Fortran_binding.h(156): warning C4200: nonstandard extension used: zero-sized array in struct/union
C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\compiler\include\ISO_Fortran_binding.h(156): note: This member will be ignored by a defaulted constructor or copy/move assignment operator

C:\temp>ifort /c /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.5.0 Build 20211109_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

C:\temp>link p.obj c++.obj /subsystem:console /out:p.exe
Microsoft (R) Incremental Linker Version 14.30.30706.0
Copyright (C) Microsoft Corporation. All rights reserved.

C:\dev\temp>p.exe
Hello World!

1 Like

@FortranFan, that’s very clever! I was not aware of ISO_Fortran_binding.h. For situations where mutation of the data is needed, I think the example in your previous post definatly should be the preferred approach.

One thing I originally was concerned about (forgot to write that though…) was if it’s possible to somehow pass non-contiguous data to the wrapper subroutine. It seems like Fortran does not allow slicing of character(len=:) variables though so that should not be a problem. The following example seems to give a compile error:

    character(len=10) :: chars

    chars = '1234567890'
    write(*,*) chars(1:10:2)

Re: “For situations where mutation of the data is needed” - no, not necessarily.

In the first post, I didn’t want to make it further complicated and get into POINTER attribute and generic interfaces (and bugs with current compiler support) and so forth.

But just note that it’s viable to work with the immutable case as well, however there is no guarantee about “without copying”, meaning a processor may very well generate a temporary in the additional level of indirection (along the lines of what you show in the original post) that is necessary:

   use, intrinsic :: iso_c_binding, only : c_char
   interface
      subroutine echo_c( str ) bind(C, name="echo_c" )
         import :: c_char
         ! Argument list
         character(kind=c_char, len=:), pointer, intent(in) :: str
      end subroutine
   end interface
   call echo( c_char_"Think of this as an extremely long string!" )
contains
   subroutine echo( str )
      character(kind=c_char, len=*), target, intent(in) :: str
      ! Local variables
      character(kind=c_char, len=:), pointer :: ps
      ps => str
      call echo_c( ps )
   end subroutine
end

C:\temp>ifort /c /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.5.0 Build 20211109_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

C:\temp>link p.obj c++.obj /subsystem:console /out:p.exe
Microsoft (R) Incremental Linker Version 14.30.30706.0
Copyright (C) Microsoft Corporation. All rights reserved.

C:>p.exe
Think of this as an extremely long string!

Fortran does not allow “slicing” of any character type, even with fixed length. The substring-range syntax is just (from:to). To achieve “slicing” you would have to arrange an array of len=1 characters, but that’s another story.