Iso_c_binding: Looking for practical example of how it helps with mangling

After reading a wikipedia about mangling, I feel that I don’t understand the need for iso_c_binding.

Question: Can you give me a simple practical example which shows the importance of iso_c_binding.

Basically, I’d like to know what can happen without it.

I’m looking for intuition in an explanation.

I’m looking for a complete example. I want to really really understand it, so give me your best explanation.

Thank you,

I have been tweeting about Fortran interoperability with C. Here are two simple examples.

Arguments corresponding to non-pointer arguments of C functions must have the VALUE attribute in the Fortran interface. A C const argument can be INTENT(IN) in the Fortran interface.

xsum_vec.f90

program test_sum_vec
use iso_c_binding, only: c_double, c_int
implicit none
interface
pure function sum_vec(x, n) result(xsum) bind(c)
import c_double, c_int
integer(kind=c_int), intent(in), value :: n
real(kind=c_double), intent(in)        :: x(n)
real(kind=c_double)                    :: xsum
end function sum_vec
end interface
integer(kind=c_int), parameter :: n = 3
real(kind=c_double) :: x(n) = real([10,20,30], kind=c_double)
print "(*(1x,f0.1))",x,sum_vec(x,n) ! 10.0 20.0 30.0 60.0
end program test_sum_vec

sum_vec.c

double sum_vec(double x[], const int n) {
	double xsum = 0.0;
	for (int i=0; i<n; i++)
		xsum += x[i];
	return xsum;
}

compile with

gcc -c -o sum_vec.o sum_vec.c
gfortran sum_vec.o xsum_vec.f90

Here is an example of Fortran passing a derived type to C.

Use the bind(c) attribute to make a derived type compatible with C, with restrictions listed at Derived Types and struct (The GNU Fortran Compiler)

date_type.f90

program date_type
use iso_c_binding, only: c_int
implicit none
type, bind(c) :: date
   integer (kind=c_int) :: year, month, day
end type date
interface
subroutine print_date(d) bind(c)
import date
type(date), intent(in), value :: d
end subroutine print_date
end interface
call print_date(date(2022,5,4)) ! 2022-05-04
end program date_type

date_type.c

#include <stdio.h>
typedef struct date {
	int year, month, day;
} date;

void print_date(date d)
{
	printf("%d-%02d-%02d\n",d.year,d.month,d.day);
}

Compile with

gcc -c date_type.c
gfortran date_type.o date_type.f90

The iso_c_binding module has nothing at all to do with name mangling. The use of bind(c) is what prevents name mangling from occurring, so that one can rely on the names of procedures between C and Fortran.

For example, two different compilers may mangle procedure names in two different ways, making it impossible to reliably call those procedures from C. I.e.

function add_one(x) result(plus_one)
  integer, intent(in) :: x
  integer :: plus_one

  plus_one = x + 1
end function

When compiled with gfortran may end up with the name add_one_, and thus need to be called from C like

extern int function add_one_(int*);
y = add_one_(*x);

but when compiled with ifort, may end up with the name _add_one and thus the C code no longer works. If you say that the procedure is bind(c), (which it can only be if the kinds of the arguments are C interoperable, hence where the iso_c_binding module comes in), then the compiler is obliged not to mangle the name, or use the one provided. I.e.

function add_one(x) bind(c, name="foo")
  use iso_c_binding, only: c_int
  integer(c_int), intent(in), value :: x
  integer(c_int) :: add_one
  add_one = x + 1
end function

Can then be called on any system with any compiler as

extern int function foo(int);
y = foo(x);
3 Likes

The issue with mangling arises from how different OS’s (Windows, Linux, AIX, etc) and related compilers handled the naming of Fortran objects. In Linux/Unix Fortran names usually had either one or two underscores appended at the end of the name. ie

void some_function_
or
void some_function_

If I remember correctly, IBM AIX assumed all lower case names and Cray T3E, C90 etc. assumed all uppercase with no extra characters (underscores etc). Therefore, if you wanted to support a wide range of OS’s and compilers you had to write #ifdef’s to handle each case. Not sure about Window’s since I haven’t written anything for Windows in over 27 years. The greatest importance of ISO_C_BINDING is that it specifies a standard way of interfacing Fortran with C and this goes beyond just the name mangling. It defines what variables in Fortran can be directly passed to C and vice-versa and a way to handle C pointers etc in Fortran and provides explicit interfaces that can be checked by the compiler to decide if it needs to do a copy in etc. There is also the issue of pass by reference vs pass by value. Many C procedures will pass some of its argument by value etc. Without ISO_C_BINDING, you will have to write a your own C wrappers etc.

If you want complete examples of how ISO_C_BINDING is used in real world applications, Look at the HDF5 interfaces

https://www.hdfgroup.org/solutions/hdf5

my part of the netCDF fortran interfaces

my Fortran interfaces to the SINTEF SISL NURBS library

and @vmagnin’s GTK interfaces

I’m sure several other examples will pop up if you do a web search

1 Like

To add to Brad’s (@everythingfunctional ) answer, you can take a look at the name mangling in action via Godbolt: Compiler Explorer. The mangling problems get worse if you want to use module subprograms, which however bring a lot of advantages including interface checking, encapsulation and data hiding.

If you only ever plan to work with a specific version of one compiler on a single operating system, then you could pretend that bind(c) doesn’t matter, and just account for the name mangling differences manually.

However using bind(c) and interoperable type definitions is a portability enhancement and form of insurance. Just consider how much pain a shift from 32-bit to 64-bit OS is causing you (and us at Discourse). Using bind(c) is essentially protecting yourself against such problems in the future by relying upon established calling conventions.

2 Likes

Note it is not iso_c_binding that is relevant, it is BIND(C, name=..) clause.

Say you are on 64-bit Windows and have a Fortran subroutine like so:

subroutine sub( s )
   character(len=*), intent(in) :: s
   print *, s
end subroutine 

And toward this an unsuspecting user writes a C caller:

extern void sub(char *, int);
int main() {
    char s[] = "Hello World!";
    sub(s, 12);
}

The user will fail to create a program:

C:\temp>gfortran -c c.c

C:\temp>gfortran -c f.f90

C:\temp>gfortran c.o f.o -o c.exe
c:/program files/codeblocks/gfortran/bin/../lib/gcc/x86_64-w64-mingw32/11.0.0/../../../../x86_64-w64-mingw32/bin/ld.exe: c.o:c.c:(.text+0x33): undefined reference to `sub'
collect2.exe: error: ld returned 1 exit status

because the name is “mangled” to SUB, though it is simple enough in this case as all uppercase version of the Fortran subprogram. But that is on 64-bit Windows with more streamlined convention from Microsoft.

Instead if the user were to be on 32-bit Windows where different calling conventions are possible - e.g., STDCALL or C - the name mangling involves further “decorations” e.g., _SUB or _SUB@8, etc. that will be confusing to the user.

Similar issues arise on different platforms each with its own variations.

With standard interoperability with BIND(C, name=..) clause, the coder can choose to christen the subprogram interoperably with the C processor in a portable manner.

2 Likes

Building upon my earlier post, if you have the following function in a module

module m
  use, intrinsic :: iso_c_binding, only: c_int
  implicit none
contains

integer(c_int) function add_one(x)
  integer(c_int), intent(in), value :: x
  integer(c_int) :: add_one
  add_one = x + 1
end function

end module

the output with x86-64 gfortran 11.3 will be named __m_MOD_add_one, however with x86-64 ifort 2021.5 it will be named m_mp_add_one_.

If you add bind(c) it will be called add_one in both cases, just as the C caller would expect. You could interpret this as a friendship gesture towards C programmers who will be using the Fortran routine as a black-box, and they will not need to worry about name mangling, if they will ever need a different Fortran compiler.

From the snippets of code you (@giraffe) have shown before, it appears your project isn’t using modules, but only external subprograms. The Fortran compilers I could test on GodBolt all used the mangling convention of appending a single underscore in this case, i.e. the output would be named add_one_.

In an earlier thread you wrote:

ISO_C_BINDING involves more code which can be considered clutter by some people, not me of course! :woman_factory_worker:

I do believe that people can be prejudiced to what they perceive to be unnecessary clutter if they only want the code to work on a specific operating system and aren’t planning for other contingencies.

You could certainly go down this route, but along the lines of @themos, the technical debt will come to bite those people at some point, just as they have been bitten now when porting from a 32- to a 64-bit operating system (albeit on a different issue of widths). Out of curiosity, in which language are you (or your company) planning to do the rewrite in?

Concerning unneccesary clutter, you can reduce it by packing your subroutines in a module, then you will need only a single use, intrinsic :: iso_c_binding statement. If you are annoyed by the verbose parameter constants such as c_double and c_int, use renaming upon import, i.e.

use, intrinsic :: iso_c_binding, only: dp => c_double, ip => c_int
2 Likes

For the record, you can try to “protect” yourself from name mangling issues by the disciplined use of C preprocessor macros. Take a look at how NLopt solves this: nlopt/f77api.c at 4a0e93ce177dc8bf0f7cf7e3f3e52521b0553958 · stevengj/nlopt · GitHub
The preprocessing is spread over three files in total, but would still fail if a new name-mangling scheme were introduced. Using bind(c) however, you can write a fully compatible and type-safe interface: GitHub - grimme-lab/nlopt-f: Fortran bindings for the NLopt library. (note in this case, the wrapping goes in the “other” direction, Fortran calling C, but the issues are the same)