Memory alignment for strings ifort vs gfortran

I experienced yet another difference in behavior between ifort (19.0.8.324) and gfortran (8.5.0); I passed a Fortran string whose memory I allocated on the Fortran side via C_PTR to C successfully. But when I attempted to free that memory on the C side, it worked for gfortran but didn’t for ifort. For the latter, the actual beginning of the allocated memory was 32 bytes before the address of the char *. Also I noticed that ifort allocate 40 bytes more than required for the stored characters (45).

For gfortran the char* pointed to the start of the allocated memory and only the number of chars required was allocated (5).

Does anyone have an idea why the char * is shifted 32 bytes forward with respect to the start of the allocated memory and why 40 bytes in excess are allocated when using ifort?

N.B.: The way I solved it was to write a Fortran routine that receives the char* back, converts it to a Fortran CHARACTER and deallocates that (FREE_STRING). This works for both compilers.

main.c:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char *f_get_string();

void f_free_string(char *c_string);

void main(int argc, char* argv) {
    char *c_string = f_get_string();
    
    printf("Here's the fetched string: %s len: %d address: %x\n", c_string, strlen(c_string), c_string);
    free(c_string);
}

mod_string.f90

MODULE MOD_STRING
   USE ISO_C_BINDING

CONTAINS
   FUNCTION CONSTRUCT_STRING() RESULT(my_string)
      USE ISO_C_BINDING
      IMPLICIT NONE
      CHARACTER(LEN=:), ALLOCATABLE :: my_string
      my_string = "asdf" // C_NULL_CHAR
   END FUNCTION

   FUNCTION GET_STRING_SUB() RESULT(c_string_ptr) BIND(C, name='f_get_string')
      USE ISO_C_BINDING
      IMPLICIT NONE
      TYPE(C_PTR) :: c_string_ptr

      CHARACTER(LEN=:), POINTER :: c_string
      CHARACTER(LEN=:), ALLOCATABLE :: f_string

      f_string = CONSTRUCT_STRING()
      ALLOCATE(c_string, SOURCE=f_string)
      WRITE(*,'(A,I0)') "f_string_len: ", LEN(f_string)
      WRITE(*,'(A,I0)') "c_string_len: ", LEN(c_string)
      c_string_ptr = C_LOC(c_string)
      WRITE(*,'(A, Z0)') "LOC(C_STRING): ", LOC(c_string)
   END FUNCTION GET_STRING_SUB

   SUBROUTINE FREE_STRING(c_string_ptr) BIND(C, name='f_free_string')
      USE ISO_C_BINDING
      IMPLICIT NONE
      TYPE(C_PTR), VALUE :: c_string_ptr

      CHARACTER, POINTER :: f_string

      CALL C_F_POINTER(c_string_ptr, f_string)
      WRITE(*,*) "FREE ", f_string
      DEALLOCATE(f_string)

   END SUBROUTINE FREE_STRING

END MODULE MOD_STRING

CMakeLists.txt:

cmake_minimum_required (VERSION 3.0)

enable_language(C Fortran )

project(c_calls_fortran C)

set(CMAKE_C_COMPILER "gcc")
set(CMAKE_Fortran_COMPILER "ifort")
set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -w -g")
set(CMAKE_Fortran_FLAGS_DEBUG "-g")
set(CMAKE_C_FLAGS_DEBUG "-g")

add_library(mod_string STATIC mod_string.f90)
set_property(TARGET mod_string PROPERTY LINKER_LANGUAGE Fortran)
set_target_properties(mod_string PROPERTIES LINK_FLAGS -static-intel )
target_link_libraries(mod_string  ifcoremt irc pthread imf dl)

link_directories(.)
link_directories("path/to/fortran/libs")

add_executable(c_calls_fortran main.c)
set_property(TARGET c_calls_fortran PROPERTY LINKER_LANGUAGE C)
target_link_libraries(c_calls_fortran  mod_string ifcoremt irc pthread imf dl)

You need to deallocate it with the same language it was allocated. In other words an allocate needs to have a matching deallocate. The reason it works with gfortran is because the C malloc is used under the hood, but that’s an implementation detail you should not rely on.

Think of Fortran arrays and characters as dope vectors. In addition to the data itself, they store meta-data, such as the size and potentially other things (rank, stride, element length).

That’s the correct way to do it. :+1:

Is it really?

At the beginning there is a character(:), allocatable object, and after the C round-trip we get a character, pointer object. I wonder if it’s valid or if the deallocation works by chance (my opinion is that it works by chance)…

Good observation @PierU.

in the free_string procedure the line above needs to be changed to:

CHARACTER(len=:), POINTER :: f_string

Is anything else wrong?

I don’t know… It’s still an allocatable at the beginning and a pointer at the end. Looks really doubtful to me, and I wouldn’t really want to rely on such code.

I’ve tested, and len(f_string) after the c_f_pointer in FREE_STRING() returns a length of 0. Which is not really surprising, as the compiler has no way to know what was the original length of the character(:) variable. In these conditions, how can we be sure that deallocate does what is expected?

It’s not an allocatable:

      CHARACTER(LEN=:), POINTER :: c_string
      CHARACTER(LEN=:), ALLOCATABLE :: f_string

      f_string = CONSTRUCT_STRING()
      ALLOCATE(c_string, SOURCE=f_string)   ! <-- becomes a pointer
      c_string_ptr = C_LOC(c_string)

Ah yes, my mistake… Still, it doesn’t work:

module foo
use iso_c_binding
implicit none

contains

   SUBROUTINE FREE_STRING(c_string_ptr)
      IMPLICIT NONE
      TYPE(C_PTR), VALUE :: c_string_ptr

      CHARACTER(:), POINTER :: f_string

      CALL C_F_POINTER(c_string_ptr, f_string)
      WRITE(*,*) "FREE ", f_string, len(f_string)
      DEALLOCATE(f_string)

   END SUBROUTINE FREE_STRING

end module

program foofoo
use iso_c_binding
use foo
implicit none

character(:), pointer :: str

allocate(character(128)::str)
print*, associated(str)
call free_string(c_loc(str))
print*, associated(str)

end

gfortran output (ifx one is the same):

 T
 FREE            0
 T

str is still reported as associated after the call to free_string (and the string is seen as zero-length in the routine). I don’t know if the memory is effectively deallocated, but at least there’s no way to be sure, and the state of the pointer is inconsistent.

I think you need to nullify after deallocate to make the state of the pointer consistent. At least that’s the way to do it in C: c - Should one really set pointers to `NULL` after freeing them? - Stack Overflow

Personally, I’d consider it good practice to check if the pointer was associated in the first place:

   SUBROUTINE FREE_STRING(c_string_ptr) bind(c)
      IMPLICIT NONE
      TYPE(C_PTR), VALUE :: c_string_ptr

      CHARACTER(:), POINTER :: f_string

      if (c_associated(c_string_ptr)) then
         CALL C_F_POINTER(c_string_ptr, f_string)
         WRITE(*,*) "FREE ", f_string, len(f_string)
         DEALLOCATE(f_string)
         nullify(f_string)
     end if

   END SUBROUTINE FREE_STRING

This brings it closer to the behavior expected of free in C:

If ptr is a null pointer, the function does nothing.

It’s precisely for these reasons that pointers are frowned upon, because there are so many steps involved for “safe” usage, that the probability of making an error is high. But you can’t avoid it when you are interoperating types which are not interoperable.

Alright, thanks for the hints. I had tried to use some kind of CHARACTER array in the FREE_STRING routine but I probably did something wrong because I couldn’t get it to compile. I’ll try out your suggestions.

OK, it’s actually expected that the original pointer be reported as associated even after the deallocation… Nonetheless something looks dubious to me, as the string is reported as zero-length in the routine.

In addition, I see that a new c_f_strpointer() has been introduced in F2023 to specifically address the case where the Fortran side is a “deferred-length character pointer”. Which makes me think that c_f_pointer() should not be used in this case.

EDIT: and it’s confirmed by the text about c_f_pointer:

1 Like

That’s new to me :thinking:. Thanks for investigating.

Does that mean deferred-length characters can only be passed to C using the F2018 CFI_cdesc_t * type?

It’s not really clear to me, but I don’t think that deferred lenght types can be passed to C at all. But at least a deffered length string can be passed as a normal string, with the c_null_char character present.

So from what I gather, the “clean” solution seems to be to pass the length to `FREE_STRINGand declare thef_pointer`` as an assumed length character variable like:

   SUBROUTINE FREE_STRING(c_string_ptr, len) BIND(C, name='f_free_string')
      USE ISO_C_BINDING
      IMPLICIT NONE
      TYPE(C_PTR), VALUE :: c_string_ptr
      INTEGER(C_INT), VALUE :: len

      CHARACTER(len=len), POINTER :: f_string

      CALL C_F_POINTER(c_string_ptr, f_string)
      WRITE(*,*) "FREE ", f_string
      DEALLOCATE(f_string)

   END SUBROUTINE FREE_STRING

C code snippet:

void f_free_string(char *c_string, int len);

f_free_string(c_string, strlen(c_string));

It’s probably better. Just a detail, you have to account for the extra null char somewhere, either
f_free_string(c_string, strlen(c_string)+1);
or
CHARACTER(len=len+1), POINTER :: f_string

Also, you should use the kind=c_char for the declaration of what is passed to/from C.

1 Like

It works in practice:

! f_get_string.f90
subroutine f_get_string(str) bind(c,name="f_get_string")
    use, intrinsic :: iso_c_binding, only: c_char
    character(len=:,kind=c_char), allocatable, intent(out) :: str
    str = "hello from fortran"
end subroutine
// get_string_main.cpp

#include <iostream>
#include <string_view>
#include <ISO_Fortran_binding.h>

// str is `character(c_char,len=:), allocatable`
extern "C" void f_get_string(CFI_cdesc_t *str);

static auto make_string_view(const CFI_cdesc_t *str) {
    return std::string_view{(char *) str->base_addr, str->elem_len};
}

int main() {
    
    // Create space for the Fortran C descriptor
    CFI_CDESC_T(0) f_str_desc;
    CFI_cdesc_t *f_str = (CFI_cdesc_t *) &f_str_desc;

    // Establish that the descriptor is a deferred-length string
    CFI_establish(f_str,
        nullptr,
        CFI_attribute_allocatable,
        CFI_type_char,
        0, /* elem_len */
        0, /* rank */
        nullptr);

    // Obtain the string from the Fortran routine
    f_get_string(f_str);

    // Create a C++-17 string-view
    auto str = make_string_view(f_str);
    std::cout << str << '\n';

    // Free up memory of the Fortran string
    CFI_deallocate(f_str);
    if (f_str->base_addr) return 1;
    
    return 0;
}
$ gfortran-13 -c f_get_string.f90 
$ g++-13 -std=c++17 get_string_main.cpp f_get_string.o -lgfortran
$ ./a.out
hello from fortran

I feel uncomfortable with this solution. What started as a deferred-length character(kind=char,len=:), pointer in the beginning, is being deallocated as a fixed-length character(kind=c_char,len=len), pointer instead.

What happens if you insert a new-line character somewhere in the string? The string will be truncated, and potentially some memory will be leaked. This doesn’t seem safe. At the very least, the string contents should be const,

const char* str = f_get_string();

so that the length cannot be changed.


A simpler way to handle the issue of deallocation, is to use malloc on the Fortran side, and copy the contents of the Fortran deferred-length string into the C char* buffer:

module mod_string

contains

   FUNCTION CONSTRUCT_STRING() RESULT(my_string)
      USE ISO_C_BINDING
      IMPLICIT NONE
      CHARACTER(LEN=:), ALLOCATABLE :: my_string
      my_string = "asdf" // C_NULL_CHAR
   END FUNCTION

   ! char *f_get_string();
   FUNCTION GET_STRING_SUB() RESULT(c_string_ptr) BIND(C, name='f_get_string')
      USE ISO_C_BINDING
      IMPLICIT NONE

      interface
        function c_malloc(size) bind(c,name="malloc")
            import c_ptr, c_size_t
            integer(c_size_t), value :: size
            type(c_ptr) :: c_malloc
        end function
      end interface

      TYPE(C_PTR) :: c_string_ptr
      CHARACTER(LEN=:), ALLOCATABLE :: f_string

      f_string = CONSTRUCT_STRING() ! already null-terminated
      c_string_ptr = c_malloc(len(f_string,c_size_t))

      block
        character(kind=c_char,len=len(f_string)), pointer :: c_string
        call c_f_pointer(c_string_ptr, c_string)
        c_string = f_string
      end block

   END FUNCTION GET_STRING_SUB

end module

This way you can use free on the C side of the program.

I do either :slight_smile: … And yes, to be safer the string should not be modified while in the C part of the code (or at least the original length should be keep along).

(about the CFI descriptor approach):

Indeed, I’ve checked the “Modern Fortran Explained” (F2018 edition) and they explicitly say that passing an assumed length string actually passes a cfi_cdesc_t*. Although they do not mention the deferred lenght case, it looks logical that this works also with them. In practice the ->elem_len member is the Fortran length of the string and ->rank is 0.

This is probably the cleanest way.

The following routine is accepted by ifx, gfortran, and flang-new,

subroutine hello(msg) bind(c)
use, intrinsic :: iso_c_binding, only: c_char
character(kind=c_char,len=:), allocatable, intent(out) :: msg
msg = "Hello from Fortran"
end subroutine

and would have the matching C prototype: void hello(CFI_cdesc_t *msg).

There is special bullet point (6) for this case in the F2018 interpretation document, section 18.3.6, paragraph 2,