Difference in BIND(C) behavior for assumed-size character arrays

I am trying to call a C function from Fortran, passing in a string. However, I am seeing different generated calling conventions between gfortran and flang-new/ifx. For example, if my C function has the signature

void check_argument(char name[]);

according to Note 4 of section 18.3.7 of the Standard, I should declare an interface

interface
    subroutine c_check_argument_1(name) bind(C, name="check_argument")
        use, intrinsic :: iso_c_binding
        implicit none
        character(c_char), dimension(*) :: name
    end subroutine c_check_argument
end interface

where name is an assumed-size array of characters.

However, when I compile with flang-new, the generated IR has an extra argument inserted (presumably the array size):

declare void @check_argument(ptr, i64) local_unnamed_addr

I can also see this appear in the generated assembly, with an extra variable getting pushed onto the stack before the call.

Another way of declaring the interface is to use c_ptr (plus c_loc):

interface
    subroutine c_check_argument_2(name) bind(C, name="check_argument")
        use, intrinsic :: iso_c_binding
        implicit none
        type(c_ptr), value :: name
    end subroutine c_check_argument
end interface

I would expect that call c_check_argument_1(char_array) and call c_check_argument_2(c_loc(char_array)) would give the same generated LLVM IR and ASM. However, they do not: Compiler Explorer

On the other hand, gfortran gives the same ASM for both calls.

Which behavior is correct? It seems odd to me that flang silently inserts an extra argument.

It probably doesn’t matter, since c_char is likely to be Fortran’s default character kind, but the correct interface block should read:

interface
    subroutine c_check_argument_1(name) bind(C, name="check_argument")
        use, intrinsic :: iso_c_binding
        implicit none
        character(kind = c_char), dimension(*) :: name
    end subroutine c_check_argument
end interface

On the other hand, Fortran is interoperable only with C, not with C++. Maybe you forgot putting your C-like function declarations in an extern "C" block?

I would expect, according to C “array decay” rules, that this

void check_argument(char name[]);

Be equivalent to this

void check_argument(char *name);

So I think you’re right at saying that the “companion processor” shouldn’t generate code with that extra argument —but then again, the “companion processor”
is free to do whatever its runtime deems necessary.

With this C declaration, or one of the equivalent ones using an array, who is responsible for the terminal null in the string? Is the fortran programmer responsible for adding it, or does the fortran compiler do that with a copy-in/copy-out kind of operation?

I think this is equivalent to something like

character(len=c_char), dimension(*) :: name

so the correction above is more than just an explicit keyword, it changes the meaning.

@jwmwalrus I don’t quite follow what you’re saying. I’m only looking at C, not C++, and the extra argument is being added by flang-new. As far as I know,

void check_argument(char name[]);

is a valid C function declaration.

What exactly do you mean by “companion processor”? So far I have not invoked anything other than the Fortran processor, and the argument is being added by it.

@RonShepard You’re probably right that kind=c_char is the correct meaning, but it does not change the behavior of flang-new.

Also, I’m making sure to handle the null termination in Fortran, so that’s not the issue here. (Anyway, that would be a run-time problem, not a compile-time problem.)

@jwmwalrus Yes, you can see that

void check_argument(char name[]);

is a valid C function declaration.

Okay, so only flang-new is involved here.

Fortran interoperability with C is stated in terms of a “companion processor” —e.g., for gfortran it would be gcc; for ifx it would be icx; and for flang-new it would be clang.

Even though I’ve always thought of clang as a drop-in replacement for gcc, maybe it has some extra rules… or maybe it’s really a bug in flang-new.

I suspect this is a bug in flang-new, since the C ABI is pretty fixed on any given platform. Moreover, this was detected in a whole-program optimization pass in LLVM IR, so it sure looks like flang-new is lowering to IR incorrectly.

1 Like

In all the C-interop things I’ve done, I’ve always assumed its the programmers resposibility to add a C_NULL_CHAR at the end of a string being sent to C and conversely, check to see if a C_NULL_CHAR is at the end of a string returned to Fortran from C and strip it off before proceeding. I think there are additions to F23 to relieve some of that burden but don’t quote me on that.

I am sorry to not provide any useful info, but I’ve been trying to build flang-new to no avail, you seem to be using it. Do you have a functioning way to install it? Mine fails because it can’t find the headers for standard library related things (for C++) like “atomic.h” and things like that.

Hopefully other people help you solve your problem :slight_smile:

Do you have to build it yourself?
For macOS there’s the homebrew option:

$ brew install flang

On linux, it depends on the distribution you’re using, but (e.g., for debian-testing) it might be as simple as:

# apt-get install flang

Yes, Fortran 2023 added c_f_strpointer and f_c_string. ifx already implements those —although f_c_string is not pure.

I want to :frowning:

I think, you’ve spotted the problem, and IMO it does matter. The default argument for character is the length, not the kind, so

character(c_char), dimension(*) :: name

delares an array of character strings, each string entry having the length c_char, where c_char is an arbitrary integer the compiler uses to represent that character type.

On the other hand, with

character(kind=c_char), dimension(*) :: name

you declare an array of scalar characters of kind c_char, which is probably the desired behavior.

Unfortunately, Fortran uses integers to represent various kinds of a given data type, so, you get no compiler warning when using them in the wrong context. So, even

use iso_c_binding, only : c_char
print *, c_char + c_char
end program

is valid.

2 Likes

While it was a good guess, changing it to character(kind=c_char) did not change the generated ASM or LLVM IR. So that’s not the underlying issue. (Though it was a good reminder to be careful about character declarations!)

That’s why I started by saying that it probably doesn’t matter (only flang-new and gfortran were mentioned by the OP).

The following code

use ISO_C_BINDING
implicit none
print*, selected_char_kind('default') == c_char, c_char == 1
end

prints T T for every Fortran compiler in this computer:

$ nvfortran sel_char_kind.f90 && ./a.out 
  T  T

$ gfortran sel_char_kind.f90 && ./a.out 
 T T

$ ifx sel_char_kind.f90 && ./a.out 
 T T

$ flang-new sel_char_kind.f90 && ./a.out 
 T T

2 Likes

Here’s an updated live example comparing passing a character variable as type(c_ptr) vs. as character(kind=c_char), dimension(*): Compiler Explorer

The important diff is the call in the LLVM IR (line 152):

-  call void @check_argument(ptr nonnull %18)
+  call void @check_argument(ptr nonnull %18, i64 1)

to a function which is defined as (line 282):

-declare void @check_argument(ptr) local_unnamed_addr
+declare void @check_argument(ptr, i64) local_unnamed_addr

flang-new is adding an additional parameter to the C-linkage function call (relative to passing a type(c_ptr)) and explicitly passing the number 1. In the generated ASM, this results in one extra mov instruction as that argument is pushed onto the stack. By comparison, if you look at the ASM from gfortran, passing by type(c_ptr) and by character(kind=c_char), dimension(*) give identitcal output.

Either flang-new is wrong or gfortran is wrong; I don’t think they can simultaneously be correct.

I think the method for implicitly passing the length of a character string is compiler-dependent (e.g., at the end of all explicit arguments,or after each character argument), so I think it cannot be compatible with bind(C) procedures. Also, it seems (from the assembly output on CompilerExplorer) that C codes with formal char arguments are compiled with no special care about Fortran (as expected), so passing implicit length variables seems problematic… (so I guess gfortran is correct).

(RE “companion processor”, I guess C and Fortran compilers with LLVM backend is not 1-to-1 correspondence (e.g., flang-new and Lfortran), so bind(C) procedures should not rely on any Fortran-processor dependent implicit rule of argument passing.)

Being “right” and being “slower” are two different things. You might need a (minimally reproducible) example, with both a *.c file and a *.f90 file, to prove the former.

I mentioned “companion processor” in my initial reply, because if you compile with flang-new on the Fortran side, then the expectation is that you use clang for the C side.

As I see it, flang-new might not need to care about generating the same code as gfortran or being compatible with gcc (but maybe I’m wrong).

True. But by far the most common is to add a list of “hidden” value arguments containing character string lengths after the list of formal ones. This convention goes all the way back to the original unix f77 compiler. (I know that Way Back When, the CDC FTN5 and Cray compilers used a 60- or 64-bit descriptor instead of a simple memory address for each character argument. The descriptor encoded both address and length. But not sure if anyone else has since then.)

Since C only knows about arrays of single characters, a ‘1’ as string length for each C Interoperable character argument would be expected.

Adding a null terminator byte should be the responsibility of the Fortran caller. However if you pass a normal Fortran string (e.g., without constraints of C Interop), and have the C side look at the ‘hidden’ string length argument, the C side then knows where the end of the string is.

One nuance with the hidden string length argument is whether it should be declared on the C side as an int, long, long long, or size_t. Compilers differ…

If this is a compiler error, and not an intentional feature, then it looks like the compiler is ignoring the bind(c) attribute in the interface and is passing the arguments as if it is a fortran subroutine. That is, a fortran subroutine would expect the extra integer value as a hidden argument that describes the length of the actual argument (so that the subroutine can use len(dummy) to determine that length. The C function instead must either determine that string length by other means (fixed-length convention, global variables, etc.) or it must search the string for the trailing null character.