Shared libraries with bind(C) for dummies

Hello everybody,

i am a postdoc in the general field of computational solid state physics, currently based in Zagreb. I mostly write my algorithms in Fortran, and i’m recently discovering the possibilities of creating bind(c) routines, compiling them into a .so library, and then loading them into other languages.

I’ve never had much proper programming education, so reading high-level code where these things are common routine is not simple for me. More often than not they are not commented, and the descriptions are usually not very understandable. StackOverflow provides some relief, but it’s also hard, and time-consuming to understand some specific problem of somebody, only to conclude their problem is different than yours.

So anyway, i have created a git repo of what i have for now: GitHub - mgoonde/f_so_toy
The idea being to compile the Fortran api into one single .so, that gets interfaced from different codes, and see how to pass different types of data back and forth. My starting point was basically to learn from the LAMMPS code, see how they do it there, and try to emulate myself. For the moment i have interfaces to Fortran, Python, lua, and C (with C having the most issues, i am crap in C :slight_smile: ).

But i have lots of questions regarding the procedure. The main idea is to pack any data that gets passed around into type( c_ptr ) variables, and then carry around their corresponding type and size. That seems to make some procedures quite nice, but not sure if this is safe/efficient/leaks … Also, where is the memory allocated versus where it should be allocated, should c_loc be used, or is it preferred to malloc in the application and then c_f_pointer in the API? Can such interfaces perform safely with openMP, and how does this behave in MPI, etc. What could be some better approaches, or what should be kept in mind when designing such things.
If anybody has some comments/ideas/etc, it would be nice to hear from people who know these things better.

cheers, i like the forum!

2 Likes

Welcome to the forum @mgunde
You are in the right place for such questions. There are many experts in this forum who can help, if you could further refine and itemize your questions, perhaps with some example code.

1 Like

I am curious about the various answers and opinions about this topic too.

My general feeling is that the C interoperability features in fortran are intended to be sufficiently general so that either approach may be used in an efficient and portable way. At least that is the goal, admitting that practice sometimes falls short. That means that the approach chosen can be based on other factors, convenience, robustness, and so on.

An example might be that if the programmer wants to use the fortran feature that local allocatable entities are automatically deallocated by the compiler when the procedure returns to the caller, then it would make sense to do the allocations in fortran and to pass the pointers to the C code with c_loc(). If done the other way, then the programmer would need to worry about invoking the deallocations on the C side, which can lead to inadvertent memory leaks. Another example is giving the C code access to openMP or coarray fortran entities. The C side of the code cannot be expected know anything about fortran’s shared memory or distributed memory programming conventions, so the programmer must handle those on the fortran side. This all allows the programmer to use these nice features of fortran allocatable arrays (think of them as well-behaved pointers) and parallel entities, while still giving full access to the interoperable C code.

Great, i would prefer to stay at the level of a general discussion for now.

The context i have in mind is that my Fortran src is some small algorithm that i want to call from some larger code, without heavy modifications of the latter, so that my Fortran algo is effectively a plugin. In this situation i typically need to pass some data from the main code into the plugin, which modifies it, and returns the same data transformed, or other generated data.

I guess my main question would be: what considerations should be taken for the plugin to not make a mess with the memory, given that the main code could be run in parallel/multithread, and preferably to not make copies of data if not necessary.

Irrespective of whether you use C, C++, Rust, Zig, or Fortran to write the shared library, your biggest decision is how to manage memory.

The easiest way to manage memory is to just not do it. Your API requires everything it needs to get passed into the function call. This is how old school Fortran libraries are written: The “We don’t allocate memory, so you better give us all the scratch space we will need” approach.

This “solves” the multi-threading and memory issues by pushing the problem onto the caller.

An API with this approach will look something like this: (warning,:untested psuedocode-ish)

subroutine my_cool_func_c(n,input_data,output_data) bind(C, name="my_cool_func")
    integer(c_int),value, intent(in) :: n
    type(c_ptr),value, intent(inout) :: input_data, output_data
    real(c_float), pointer :: input_data_f(:), output_data_f(:)
    call c_f_pointer( input_data, input_data_f, [n])
    call c_f_pointer( output_data, output_data_f, [n] )
    call my_cool_func(input_data, output_data)
end subroutine

subroutine my_cool_func(i,o)
    real(real32), intent(in) :: i(:)
    real(real32), intent(out) :: o(:)
    ! Do stuff
end subroutine

Does this suffice for your use case?

2 Likes

This is how old school Fortran libraries are written: The “We don’t allocate memory, so you better give us all the scratch space we will need” approach.

Right, this is understandable.
But what to do when the output_data has an a-priori unknown size? I could pre-allocate it in the caller to some large-enough size, and then have its true size as another integer output of the subroutine which computes it. But what would be a more general, or modern approach?

Does this suffice for your use case?

For many cases it does, but it would be cool to learn about other possibilities/features :slight_smile:

Many C libraries (like MPI, for example) go with the “opaque type” approach. All the function calls operate on pointer to objects, not the objects themselves. The callers have no view into what the object actually contains. In this way, you can use all the nice Fortran features. It would look something like this: (again, untested seat-of-my-pants pseudo-code-ish)

/* c header file */

typedef struct MyOpaqueType;

MyOpaqueType* MOT_Create();

void MOT_Free(MyOpaqueType* mot);

void MOT_DoComputation(MyOpaqueType* mot, /* other params */);

/* This could also return an int instead of using an out parameter, but I'm following the MPI style here */
void MOT_ResultSize(MyOpaqueType* mot, int* n);

void MOT_GetResults(MyOpaqueType* mot, int n, float * fill_array);
module MyOpaqueType_mod
use iso_c_binding
implicit none

type :: MyOpaqueType
! Stuff, doesn't even need to be bind(C)
end type

contains

type(c_ptr) function MOT_Create() bind(C)
    type(MyOpaqueType), pointer :: fptr

    allocate(fptr)
    MOT_Create = c_loc(fptr)
end function

! Remember to specify value, as C passes by-value, Fortran by-reference!
! I think I forgot this in the last post!
subroutine MOT_Free(cmot) bind(C)
    type(c_ptr), value, intent(inout) :: cmot
    type(MyOpaqueType), pointer :: mot

    call c_f_pointer(cmot, mot)

    deallocate(mot)
end subroutine

subroutine MOT_DoComputation(cmot,...) bind(C)
    type(c_ptr), value, intent(inout) :: cmot
    type(MyOpaqueType), pointer :: mot
    call c_f_pointer(cmot, mot)
    ! Now use mot to do your computation
end subroutine

subroutine MOT_ResultSize(cmot, csize) bind(C)
    type(c_ptr), value, intent(inout) :: cmot
    type(c_ptr), value, intent(inout) :: csize
    type(MyOpaqueType), pointer :: mot
    integer(c_int), pointer :: size

    call c_f_pointer(cmot,mot)
    call c_f_pointer(csize,size)

    ! Now extract the result size out of mot, store it in size
end subroutine

subroutine MOT_GetResults(cmot, n, cdata) bind(C)
    type(c_ptr), value, intent(inout) :: cmot, cdata
    integer(c_int), value :: n
    type(MyOpaqueType), pointer :: mot
    real(c_float), pointer :: data(:)
    
    call c_f_pointer(cmot,mot)
    call c_f_pointer(cdata,data,[n])

    ! Now fill in the data
end subroutine
end module

Within the Fortran code you can now do anything you wish, because the C code only ever holds a pointer to the object, not the object itself. It’s a bit cumbersome, but it IS structured and organized and understandable. This is like the other end of the C FFI spectrum: “We do all the memory management using the object you give us, and you better have called all the functions in the right order”

This is thread-safe as long as everything you need is stored in the object itself. That means no module-level variables, no static variables, etc.

2 Likes

The API you define could be a mix of these 2 approaches, as you suggested.

If you want to go further, you can look into the new stuff with ISO_Fortran_binding.h. I believe this is supposed to define types that let C code operate on Fortran allocatable arrays, etc. I’ve never looked into it myself.

If your routines are loaded from a shared library, a.k.a plugin, you probably have a defined interface you need to provide? I would expect the driving code (LAMMPS, OpenMM, etc.) use a template method design, which you then customize using your own “hook operations.”

If the driving code has a memory management strategy, e.g. it provides its own functions for this purpose, then it’s best to use that. Perhaps the applications manages memory centrally, and has a central mechanism to deallocate memory in case of irrecoverable errors.

If there is no such thing then you are free to do whatever you want. The solution @dwwork suggested using an opaque pointer is a good one :+1: if you are interoperating with a C code. The data-hiding property of the opaque pointer enables you to keep your data safe also in a multi-threaded environment vs something like keeping arrays in a module scope which could be problematic.

The opaque pointer is also known as a handle:

In computer programming, a handle is an abstract reference to a resource that is used when application software references blocks of memory or objects that are managed by another system like a database or an operating system.

You mentioned above, that your Fortran subprogram may return data of unknown size. This means it’s almost certainly dynamically allocated on the heap (unless it’s a code like NWChem which does things differently). These allocations must be performed consistently:

  • C: malloc / free
  • Fortran: allocate / deallocate
  • C++: new / delete

This does not exclude the possibility of calling malloc within Fortran, when the size of the data of becomes known. Here is a variation of @dwwork’s routine where the assumption is the caller will perform the deallocation:

! void MOT_GetResults(MyOpaqueType *mot, int n[3], float* output);
!    output is guaranteed to be unallocated on entry
!    caller promises to free it correctly using the `free` function
!
subroutine MOT_GetResults(cmot, n, output) bind(C,name="MOT_GetResults")
    type(c_ptr), value, intent(inout) :: cmot
    type(c_ptr), value, intent(out) :: output
    integer(c_int), intent(out) :: n(3)

    interface 
       function c_malloc(size) bind(c,name="malloc")
          import c_ptr, c_size_t
          integer(c_size_t), intent(in), value :: size
          type(c_ptr) :: malloc
       end function
    end interface

    type(MyOpaqueType), pointer :: mot

    real(c_float), pointer :: f_data(:,:,:)
    integer :: nel

    call c_f_pointer(cmot,mot)  ! Retrieve Fortran objext 
     
    ! Size of output only known at runtime from mot or otherwise
    n(1) = ...
    n(2) = ...
    n(3) = ...

    nel = n(1) * n(2) * n(3)
    output = c_malloc( nel * c_sizeof(1.0_c_float)) ! watch out for integer overflow!
    if (.not. c_associated(output)) then
       ! malloc failed
       return
   end if

    call c_f_pointer(output,f_data,n) 

    ! Now fill in the data
    f_data = ...

end function

I imagine that in the scenario you are talking about, the driving application looks something like this (but much bigger):

// driver.c

// The user extends the program by providing a plugin as a shared library
// $ USER_PLUGIN=libmyforcefield.so ./driver

// The plugin interface (to be implemented in C, Fortran, or something else)
typedef void * (*PluginConstructor)(/* params */);
typedef int (*PluginOperation)(void * /* handle */, /* ... */);
typedef int (*PluginDestructor)(void * /* handle */);

int main() {

    const char *envvar = "USER_PLUGIN";
    if(!getenv(envvar)){
        fprintf(stderr, "The environment variable %s was not found.\n", envvar);
        exit(1);
    }

   char *userlib = getenv(env);

   PluginConstructor p_constructor;
   PluginOperation p_operation;
   PluginDestructor p_destructor;

   // call dlopen, etc...
   int ierr = open_plugin(userlib, p_constructor, p_operation, p_destructor);

   // construct handle with user data structures
   void *handle = p_constructor( /* params */ );

   // call user algorithms
   ierr = p_operation(handle, /* ... */);

   // destroy handle when it's not needed
   ierr = p_destructor(handle);

   return 0;
}

At least for your Fortran and C apps you could also use a static library. But I’m pretty sure your Python and Lua apps require dynamic loading. But your examples look quite advanced already :+1:. In principle you can do dynamic loading in Fortran too.

The R interpreter, has a mechanism for registering native routines, which helps speed up the dynamic loading process. I suppose this is similar to writing a Python extension module specific to the CPython interpreter. I’m not sure if Lua has something like this too.

This concept is sometimes called type erasure and can be used for generic programming. The canonical example is the C qsort function:

void qsort(void *ptr, std::size_t count, std::size_t size, cmp);

Instead of N sorts for each possible type, you provide only one.

It’s also used in other libraries, one that comes to mind is NVIDIA’s cuSPARSE Generic API. Here’s a routine as example:

cusparseStatus_t
cusparseCreateConstSpVec(cusparseConstSpVecDescr_t* spVecDescr,
                         int64_t                    size,
                         int64_t                    nnz,
                         const void*                indices,   // generic type
                         const void*                values,    // generic type
                         cusparseIndexType_t        idxType,   // enumerator for indices type
                         cusparseIndexBase_t        idxBase,
                         cudaDataType               valueType) // enumerator for values type

I assume that internally they cast from void * to the user-specified valueType and then dispatch to the right routine (potentially implemented using C++ templates). This helps minimize the API you expose to consumers; but at the cost of type-safety for your consumers.

Amazing! It will take me a few days to digest all this, thanks alot (i will come back with questions :slight_smile: )!!

I always thought that “driver” routines/API should not be placed in a module, but i have basically zero arguments for that.

Yeah this is sometimes the case, for example LAMMPS provides the plugin load mechanism, or Siesta has the lua hooks through which you can access/modify the internal data, but some other codes like Quantum espresso just simply give you an empty subroutine (at least last time i checked). And the major plus point of dynamic loading is that sometimes you cannot recompile the main code (no access to the src on a cluster for example).
It would be nice to have a kind-of “centralised” API for the plugin, and then make code-specific intermediate routines, which comply with whatever is the template/needed organisation. Does that make sense?

I think i got the hang of the opaque-type idea a bit. But a question regarding the handles:
Is it possible to transfer the same handle from one program to another? In the sense, for example i would open a library in python, put some data into it, and then pass the handle to a fortran program (via printing and reading the pointer address to a file maybe?), which is launched independently, and could directly read the data from this handle? The python remaining open for this whole time of course.

The short answer I believe is no. Two separate processes, despite loading the same shared library, will get their own segment of memory. If you want to share data across processes you need inter-process communication.

See this thread: c - How can I access and modify a global variable across multiple files? - Stack Overflow (the title is a bit misleading) for a partial explanation. Citing another Stackoverflow thread,

… sharing class instances through a shared memory segment is usually a very bad idea . It is much safer to pass simple struct data, and reconstruct your object on the reader side (look up serialization and marshalling on the net for more information).

You can find a description of inter-process communicaton in the context of Fortran here: Inter-Process Communication| Programming in Modern Fortran (courtesy of @interkosmos)

Serialization of data would essentially mean dumping the data somewhere and then reading it back in the correct format, right? Sounds simple enough.

Correct. The data doesn’t need to be dumped to a file, it could be passed using a pipe or a message queue. In some cases you may have data which doesn’t even fit on memory, but by processing it on the fly, you are still able to perform your desired operation. There was a related issue a on inter-process communication a while ago: Fortran called by Python/Vice-Versa

In the linked Reddit post, the user “ush4” showed a nice example using MPI:

ush@luft:~/$ cat pythonmpi.f90  
use mpi  
real :: array(5)=(/1,2,3,4,5/)  
call mpi_init(ierr)  
call mpi_comm_rank(mpi_comm_world, myid, ierr)  
if(myid==0) call mpi_send(array, 5, mpi_real, 1, 999, mpi_comm_world,& ierr)  
end

ush@luft:~/$ cat pythonmpi.py
from mpi4py import MPI
import numpy
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
if rank == 1:    
   data = numpy.empty(5, dtype='f')
   comm.Recv([data, MPI.REAL4], source=0, tag=999)
   print("python got data:",data)

ush@luft:~/$ mpif90 pythonmpi.f90

ush@luft:~/$ mpiexec -quiet -n 1 ./a.out : -n 1 python3 pythonmpi.py

python got data: [1. 2. 3. 4. 5.]

In this example the line mpiexec launches two processes with a shared communicator.


I will point out that not all platforms choose to support dynamic linking and loading. For example the developers of Plan 9 (a fringe operating system developed at Bell Labs) considered it to be harmful. They may have been right, just this week I noticed a vulnerability was discovered affecting the dynamic loader in the GNU C Library, allowing users to obtain root privileges.

It does, absolutely. If you look at the LAMMPS plugin load mechanism, it is described in detail, what the plugin API looks like.

Taking the first example on that page, plugin writers are expected to write a function named exactly lammpsplugin_init with C linkage. The function initializes a plugin struct, that carries a pointer to a factory function (the morse2creator) which creates a polymorphic Pair instance.

using namespace LAMMPS_NS;

static Pair *morse2creator(LAMMPS *lmp)
{
  return new PairMorse2(lmp);
}

extern "C" void lammpsplugin_init(void *lmp, void *handle, void *regfunc)
{
  lammpsplugin_regfunc register_plugin = (lammpsplugin_regfunc) regfunc;
  lammpsplugin_t plugin;

  plugin.version = LAMMPS_VERSION;
  plugin.style   = "pair";
  plugin.name    = "morse2";
  plugin.info    = "Morse2 variant pair style v1.0";
  plugin.author  = "Axel Kohlmeyer (akohlmey@gmail.com)";
  plugin.creator.v1 = (lammpsplugin_factory1 *) &morse2creator;
  plugin.handle  = handle;
  (*register_plugin)(&plugin,lmp);
}

This plugin interface doesn’t really support Fortran, because the lammpsplugin_t is a C++ object, hence not directly interoperable.

What you could still do is call Fortran subprograms from the member functions of the PairMorse2 C++ class (a child class of the Pair class representing a formula for computing pairwise interactions):

class PairMorse2 : public Pair {
 public:
  PairMorse2(class LAMMPS *);
  ~PairMorse2() override;
  void compute(int, int) override;

  void settings(int, char **) override;
  void coeff(int, char **) override;
  double init_one(int, int) override;
  void write_restart(FILE *) override;
  void read_restart(FILE *) override;
  void write_restart_settings(FILE *) override;
  void read_restart_settings(FILE *) override;
  void write_data(FILE *) override;
  void write_data_all(FILE *) override;
  double single(int, int, int, int, double, double, double, double &) override;
  void *extract(const char *, int &) override;

 protected:
  double cut_global;
  double **cut;
  double **d0, **alpha, **r0;
  double **morse1;
  double **offset;

  virtual void allocate();
};

Note the functions carrying the override specifier. These are the routines a custom Pair needs to (or may) provide.

A side comment:

The worklist under consideration for Fortran 2023 - see here - has this item:

- US06. Provide a mechanism to specify global binding name for non
        C-interoperable.
        Ref. 23-201 "F202Y Global binding name for non C-interoperable"

I am quite pleased with the J3 and WG5 committee that this item got considered quickly and got introduced into the list for Fortran 2023. To the best of my understanding, it was merely an inquiry email from a J3 rep from NASA earlier this year and in a few quick responses, this item got added with no paper, no proposal, nothing - amazing!

Anyways, if done well, I expect this work item will enhance the library development and deployment experience for Fortranners in the form of shared libraries (aka DLLs on Windows OS).

I can then envision one day a Fortranner authoring the BestAPIEver as follows :stuck_out_tongue_winking_eye::

subroutine BestAPIEver( EternalWisdom ) bind(name="BestAPIEver")
   string_t, intent(inout) :: EternalWisdom 
   EternalWisdom = "Fortran is the lingua franca of computing!"
end subroutine

and others scrambling to develop wrappers in their interpreted babbles to this API!

1 Like

I’m trying to understand the behaviour of bind(C) which I don’t really get when applied to functions or variables.
When I compile a static library, functions and variables with bind(C) will be global (symbol types T and D) even if public is not used.
However, when I create a shared library from the static one:

  • functions with bind(C) will be global (symbol type T) even I did not use public.
  • variables (might be protected) will be local (symbol type d) except if I use public.

Did you have a similar experience? I’m really interested to understand why is it working this way?

N.B: Very often, I create shared libraries from the static libraries generated by fpm. I would like to avoid public for C API functions and variables to avoid polluting the Fortran side but still make C API functions and variables global.