Difference in BIND(C) behavior for assumed-size character arrays

I think it should be possible to write an interface in Fortran for C routines that do not care about Fortran, because, for example, one may want to interoperate with an existing shared library that contains the following C function (with no “hidden” argument)…

void check_argument(char name[]);

On the C side, it can simply ignore the fact that the Fortran compiler is adding the second, “hidden”, length argument. Remember that C supports varargs. So the number of arguments potentially passed into any given C function isn’t written in stone.

NOTE 2 in section 18.3.7 of the fortran standard says, “The C language allows specification of a C function that can take a variable number of arguments (ISO/IEC 9899:2018, 7.16). This document does not provide a mechanism for Fortran procedures to interoperate with such C functions.” So that feature is not supported by fortran. On the other hand, optional arguments in fortran are supported within C interoperability by adding null arguments as appropriate to fill out the argument list.

True. But my point is that on the C side, it doesn’t necessarily know that a Fortran procedure is calling it.

In the following code, I’ve tried to test the behavior of different compilers for hidden argument passing, though I am not very sure whether this is a valid code (I don’t know how to compile two source files with CompilerExplorer, so tried to include everything in one source.) And, the result is that (1) gfortran, ifort, and ifx seem to pass no hidden variable, (2) nvfortran passes a hidden length variable, (3) LLVM-flang complains about my code :sweat_smile:, and (4) Lfortran also complains about my code (in a more complicated way :laughing:). I guess it would be better to use different source files + real C codes (rather than mimicking it with Fortran single source)…

module test_m
    use :: iso_c_binding, only: c_char, c_long
    implicit none

    interface
    subroutine sub( arr ) bind(C,name="fsub_")
        import; implicit none
        character(kind=c_char), dimension(*) :: arr
    end
    end interface
end module

subroutine fsub( arr, n )
    use iso_c_binding
    implicit none
    character(kind=c_char), dimension(*) :: arr
    integer(c_long), value :: n
    print *, "arr = ", arr(1:5)
    print *, "n = ", n
end

program main
    use test_m
    implicit none

    call sub( "hello" )
end

Result:

>>> gfortran14.2
arr = hello
n =    140732106695248

>>> ifort 2021.11.0
arr = hello
n =   -1906913433109000205

>>> ifx 2024.0.0
arr = hello
n =   -1906913433109000205

>>> nvfortran 25.1
arr = hello
n =      5

EDIT: I’ve updated the code such that it always prints the first 5 characters of arr to check whether the address of the 1st argument is passed. All compilers pass the latter (as expected), while the behavior is different for hidden length variable. To be C interoperable, I think the information on the length of the character string should be passed to C either via null char at the end of the string, or via an explicit integer argument, so as to respect the signature of the C function (which may be compiled independently of Fortran, even by using a companion C compiler).

I tested the following locally (not through Compiler Explorer).

This C code is a slight variation of the one provided by the OP through a link (I just added a new printf and modified the exiting one):

#include <stdio.h>

void check_argument(char name[]);

void check_argument(char name[]) {
    printf("arg: %s\n", name);
    if (name[0] != 'x')
        printf("  oops!\n");
}

This would be the code on the Fortran side:

use ISO_C_BINDING

implicit none

interface
    subroutine c_check_argument_1(name) bind(C, name="check_argument")
        import
        character(kind = c_char), dimension(*) :: name
    end subroutine
end interface

call c_check_argument_1('x'//C_NULL_CHAR)
call c_check_argument_1('xx'//C_NULL_CHAR)
call c_check_argument_1('c'//C_NULL_CHAR)
end

And these are the results of compiling with gfortran and flang-new, and their respective “companion processors”:

$ clang -c llvm_maybe_bug_c.c && flang-new -c llvm_maybe_bug_f.f90 && flang-new -o llvm_maybe_bug llvm_maybe_bug_c.o llvm_maybe_bug_f.o

$ ./llvm_maybe_bug
arg: x
arg: xx
arg: c
  oops!

$ gcc -c llvm_maybe_bug_c.c && gfortran -c llvm_maybe_bug_f.f90 && gfortran -o llvm_maybe_bug llvm_maybe_bug_c.o llvm_maybe_bug_f.o

$ ./llvm_maybe_bug
arg: x
arg: xx
arg: c
  oops!

So, the interoperability produces the expected results. The code generated by flang-new might be half-a-microsecond slower (as per the IR diff provided elsewhere), but not wrong.

My gcc/gfortran version is 14.2.0, and my clang/flang-new version is 19.0.6.

@septc You’re correct, and if you cajole Compiler Explorer with a bit more CMake magic, you can see that indeed flang-trunk and nvfortran pass a hidden parameter, while gfortran, ifort, and ifx seem not to:

@jwmwalrus Okay, so I think I see that this is exploiting the fact that the C calling convention allows for additional (ignored) arguments to functions, so once this gets lowered to ASM it is interoperable. However, if you look at the LLVM IR, there is not that flexibility in calling conventions, and thus the generated IR is not interoperable. This means that if you are doing, e.g., whole-program optimization as an LLVM pass, you’ll run into a call mismatch. So yes, this is potentially a bug in flang but it’s entirely internal (to LLVM) in terms of how flang compiles to LLVM IR.

1 Like

It seems that C ignore the implicitly passed length variable (in the case of LLVM-flang), so generating a seemingly correct behavior. But my concern is whether such an implicit passed extra variable can cause possible issues (when those variables are not defined in the C function with no “…”). I would like to learn more about this point (including the treatment of variadic arguments on the C side). FWIW, the result of Perplexity was something like this, though I am not sure to what extent the contents are correct…

(my prompt) In C language, is it legitimate to pass an extra argument to a function that is not declared in the function definition?

(response) In C, passing extra arguments to a function that does not declare them is generally not legitimate in terms of compiler checks, but it may not always result in a compile-time error. Here’s why:

  1. Function Prototypes and Definitions: If a function is defined in the same file where it is called, the compiler will typically check the number and types of arguments against the function definition and throw an error if they do not match. However, if the function is declared in a header file or elsewhere without a prototype, the compiler may not enforce this check at compile time.

  2. Variable Argument Lists: C supports variable argument lists using the ... syntax in function declarations. Functions like printf use this feature, allowing them to accept a variable number of arguments. However, this is explicitly declared in the function definition and is not applicable to functions without such a declaration.

  3. ABI and Calling Conventions: The behavior of passing extra arguments can depend on the Application Binary Interface (ABI) and calling conventions used by the compiler. In many cases, extra arguments are simply ignored if they are not accessed within the function. However, this can lead to undefined behavior if the function attempts to access memory beyond its declared parameters.

  4. Compile-Time vs. Runtime Checks: C compilers typically do not perform runtime checks for argument counts unless explicitly programmed to do so. Therefore, while passing extra arguments might not cause a compile-time error, it can lead to runtime issues if the function accesses memory incorrectly.

In summary, while passing extra arguments to a C function might not always result in a compile-time error, it is generally not recommended and can lead to undefined behavior. It is best to ensure that the number and types of arguments passed to a function match its declaration.

Citations:
[1] Setting properties of a function's arguments — The Rocq Prover 9.1+alpha documentation
[2] [Tech Story] The NeverEnding Story of writing a rule for argument passing in C++ - Sonar Updates - Sonar Community
[3] Functions with Variable Argument Lists (C++) | Microsoft Learn
[4] https://www.mathworks.com/help/matlab/matlab_prog/function-argument-validation-1.html
[5] language design - Why does C++ allow extra arguments to printf? - Programming Language Design and Implementation Stack Exchange
[6] Effective Go - The Go Programming Language
[7] Reddit - Dive into anything
[8] gcc - Why does passing extra arguments to a C function does not result in a compile-time error? - Stack Overflow

@septc also interesting: if you don’t declare sub as bind(C), gfortran/ifort/ifx all do pass an extra parameter (i.e. in “Fortran calling convention”).

Or, to turn it around the other way, for flang and nvfortran adding bind(C) to the interface declaration does nothing whereas for gfortran/ifort/ifx adding bind(C) removes the hidden parameter.

Hmm… Maybe the behavior was introduced by pgfortran and kept in flang-new and nvfortran for compatibility?

(That reminds me of that old Lotus bug that MS Excel implemented on purpose :laughing:)

1 Like

And what if the character array is not the last argument?

My understanding is that the additional length parameters are always at the end of the list, regardless of the position of the parent argument.

2 Likes

Ok. In this case the extra length argument doesn’t matter at all, as it is simply ignored on the C side. Attempting to use it on the C side would be possible, but it would not be portable.