i am a postdoc in the general field of computational solid state physics, currently based in Zagreb. I mostly write my algorithms in Fortran, and i’m recently discovering the possibilities of creating bind(c) routines, compiling them into a .so library, and then loading them into other languages.
I’ve never had much proper programming education, so reading high-level code where these things are common routine is not simple for me. More often than not they are not commented, and the descriptions are usually not very understandable. StackOverflow provides some relief, but it’s also hard, and time-consuming to understand some specific problem of somebody, only to conclude their problem is different than yours.
So anyway, i have created a git repo of what i have for now: GitHub - mgoonde/f_so_toy
The idea being to compile the Fortran api into one single .so, that gets interfaced from different codes, and see how to pass different types of data back and forth. My starting point was basically to learn from the LAMMPS code, see how they do it there, and try to emulate myself. For the moment i have interfaces to Fortran, Python, lua, and C (with C having the most issues, i am crap in C ).
But i have lots of questions regarding the procedure. The main idea is to pack any data that gets passed around into type( c_ptr ) variables, and then carry around their corresponding type and size. That seems to make some procedures quite nice, but not sure if this is safe/efficient/leaks … Also, where is the memory allocated versus where it should be allocated, should c_loc be used, or is it preferred to malloc in the application and then c_f_pointer in the API? Can such interfaces perform safely with openMP, and how does this behave in MPI, etc. What could be some better approaches, or what should be kept in mind when designing such things.
If anybody has some comments/ideas/etc, it would be nice to hear from people who know these things better.
Welcome to the forum @mgunde
You are in the right place for such questions. There are many experts in this forum who can help, if you could further refine and itemize your questions, perhaps with some example code.
I am curious about the various answers and opinions about this topic too.
My general feeling is that the C interoperability features in fortran are intended to be sufficiently general so that either approach may be used in an efficient and portable way. At least that is the goal, admitting that practice sometimes falls short. That means that the approach chosen can be based on other factors, convenience, robustness, and so on.
An example might be that if the programmer wants to use the fortran feature that local allocatable entities are automatically deallocated by the compiler when the procedure returns to the caller, then it would make sense to do the allocations in fortran and to pass the pointers to the C code with c_loc(). If done the other way, then the programmer would need to worry about invoking the deallocations on the C side, which can lead to inadvertent memory leaks. Another example is giving the C code access to openMP or coarray fortran entities. The C side of the code cannot be expected know anything about fortran’s shared memory or distributed memory programming conventions, so the programmer must handle those on the fortran side. This all allows the programmer to use these nice features of fortran allocatable arrays (think of them as well-behaved pointers) and parallel entities, while still giving full access to the interoperable C code.
Great, i would prefer to stay at the level of a general discussion for now.
The context i have in mind is that my Fortran src is some small algorithm that i want to call from some larger code, without heavy modifications of the latter, so that my Fortran algo is effectively a plugin. In this situation i typically need to pass some data from the main code into the plugin, which modifies it, and returns the same data transformed, or other generated data.
I guess my main question would be: what considerations should be taken for the plugin to not make a mess with the memory, given that the main code could be run in parallel/multithread, and preferably to not make copies of data if not necessary.
Irrespective of whether you use C, C++, Rust, Zig, or Fortran to write the shared library, your biggest decision is how to manage memory.
The easiest way to manage memory is to just not do it. Your API requires everything it needs to get passed into the function call. This is how old school Fortran libraries are written: The “We don’t allocate memory, so you better give us all the scratch space we will need” approach.
This “solves” the multi-threading and memory issues by pushing the problem onto the caller.
An API with this approach will look something like this: (warning,:untested psuedocode-ish)
This is how old school Fortran libraries are written: The “We don’t allocate memory, so you better give us all the scratch space we will need” approach.
Right, this is understandable.
But what to do when the output_data has an a-priori unknown size? I could pre-allocate it in the caller to some large-enough size, and then have its true size as another integer output of the subroutine which computes it. But what would be a more general, or modern approach?
Does this suffice for your use case?
For many cases it does, but it would be cool to learn about other possibilities/features
Many C libraries (like MPI, for example) go with the “opaque type” approach. All the function calls operate on pointer to objects, not the objects themselves. The callers have no view into what the object actually contains. In this way, you can use all the nice Fortran features. It would look something like this: (again, untested seat-of-my-pants pseudo-code-ish)
/* c header file */
typedef struct MyOpaqueType;
MyOpaqueType* MOT_Create();
void MOT_Free(MyOpaqueType* mot);
void MOT_DoComputation(MyOpaqueType* mot, /* other params */);
/* This could also return an int instead of using an out parameter, but I'm following the MPI style here */
void MOT_ResultSize(MyOpaqueType* mot, int* n);
void MOT_GetResults(MyOpaqueType* mot, int n, float * fill_array);
module MyOpaqueType_mod
use iso_c_binding
implicit none
type :: MyOpaqueType
! Stuff, doesn't even need to be bind(C)
end type
contains
type(c_ptr) function MOT_Create() bind(C)
type(MyOpaqueType), pointer :: fptr
allocate(fptr)
MOT_Create = c_loc(fptr)
end function
! Remember to specify value, as C passes by-value, Fortran by-reference!
! I think I forgot this in the last post!
subroutine MOT_Free(cmot) bind(C)
type(c_ptr), value, intent(inout) :: cmot
type(MyOpaqueType), pointer :: mot
call c_f_pointer(cmot, mot)
deallocate(mot)
end subroutine
subroutine MOT_DoComputation(cmot,...) bind(C)
type(c_ptr), value, intent(inout) :: cmot
type(MyOpaqueType), pointer :: mot
call c_f_pointer(cmot, mot)
! Now use mot to do your computation
end subroutine
subroutine MOT_ResultSize(cmot, csize) bind(C)
type(c_ptr), value, intent(inout) :: cmot
type(c_ptr), value, intent(inout) :: csize
type(MyOpaqueType), pointer :: mot
integer(c_int), pointer :: size
call c_f_pointer(cmot,mot)
call c_f_pointer(csize,size)
! Now extract the result size out of mot, store it in size
end subroutine
subroutine MOT_GetResults(cmot, n, cdata) bind(C)
type(c_ptr), value, intent(inout) :: cmot, cdata
integer(c_int), value :: n
type(MyOpaqueType), pointer :: mot
real(c_float), pointer :: data(:)
call c_f_pointer(cmot,mot)
call c_f_pointer(cdata,data,[n])
! Now fill in the data
end subroutine
end module
Within the Fortran code you can now do anything you wish, because the C code only ever holds a pointer to the object, not the object itself. It’s a bit cumbersome, but it IS structured and organized and understandable. This is like the other end of the C FFI spectrum: “We do all the memory management using the object you give us, and you better have called all the functions in the right order”
This is thread-safe as long as everything you need is stored in the object itself. That means no module-level variables, no static variables, etc.
The API you define could be a mix of these 2 approaches, as you suggested.
If you want to go further, you can look into the new stuff with ISO_Fortran_binding.h. I believe this is supposed to define types that let C code operate on Fortran allocatable arrays, etc. I’ve never looked into it myself.
If your routines are loaded from a shared library, a.k.a plugin, you probably have a defined interface you need to provide? I would expect the driving code (LAMMPS, OpenMM, etc.) use a template method design, which you then customize using your own “hook operations.”
If the driving code has a memory management strategy, e.g. it provides its own functions for this purpose, then it’s best to use that. Perhaps the applications manages memory centrally, and has a central mechanism to deallocate memory in case of irrecoverable errors.
If there is no such thing then you are free to do whatever you want. The solution @dwwork suggested using an opaque pointer is a good one if you are interoperating with a C code. The data-hiding property of the opaque pointer enables you to keep your data safe also in a multi-threaded environment vs something like keeping arrays in a module scope which could be problematic.
In computer programming, a handle is an abstract reference to a resource that is used when application software references blocks of memory or objects that are managed by another system like a database or an operating system.
You mentioned above, that your Fortran subprogram may return data of unknown size. This means it’s almost certainly dynamically allocated on the heap (unless it’s a code like NWChem which does things differently). These allocations must be performed consistently:
C: malloc / free
Fortran: allocate / deallocate
C++: new / delete
This does not exclude the possibility of calling malloc within Fortran, when the size of the data of becomes known. Here is a variation of @dwwork’s routine where the assumption is the caller will perform the deallocation:
! void MOT_GetResults(MyOpaqueType *mot, int n[3], float* output);
! output is guaranteed to be unallocated on entry
! caller promises to free it correctly using the `free` function
!
subroutine MOT_GetResults(cmot, n, output) bind(C,name="MOT_GetResults")
type(c_ptr), value, intent(inout) :: cmot
type(c_ptr), value, intent(out) :: output
integer(c_int), intent(out) :: n(3)
interface
function c_malloc(size) bind(c,name="malloc")
import c_ptr, c_size_t
integer(c_size_t), intent(in), value :: size
type(c_ptr) :: malloc
end function
end interface
type(MyOpaqueType), pointer :: mot
real(c_float), pointer :: f_data(:,:,:)
integer :: nel
call c_f_pointer(cmot,mot) ! Retrieve Fortran objext
! Size of output only known at runtime from mot or otherwise
n(1) = ...
n(2) = ...
n(3) = ...
nel = n(1) * n(2) * n(3)
output = c_malloc( nel * c_sizeof(1.0_c_float)) ! watch out for integer overflow!
if (.not. c_associated(output)) then
! malloc failed
return
end if
call c_f_pointer(output,f_data,n)
! Now fill in the data
f_data = ...
end function
I imagine that in the scenario you are talking about, the driving application looks something like this (but much bigger):
// driver.c
// The user extends the program by providing a plugin as a shared library
// $ USER_PLUGIN=libmyforcefield.so ./driver
// The plugin interface (to be implemented in C, Fortran, or something else)
typedef void * (*PluginConstructor)(/* params */);
typedef int (*PluginOperation)(void * /* handle */, /* ... */);
typedef int (*PluginDestructor)(void * /* handle */);
int main() {
const char *envvar = "USER_PLUGIN";
if(!getenv(envvar)){
fprintf(stderr, "The environment variable %s was not found.\n", envvar);
exit(1);
}
char *userlib = getenv(env);
PluginConstructor p_constructor;
PluginOperation p_operation;
PluginDestructor p_destructor;
// call dlopen, etc...
int ierr = open_plugin(userlib, p_constructor, p_operation, p_destructor);
// construct handle with user data structures
void *handle = p_constructor( /* params */ );
// call user algorithms
ierr = p_operation(handle, /* ... */);
// destroy handle when it's not needed
ierr = p_destructor(handle);
return 0;
}
At least for your Fortran and C apps you could also use a static library. But I’m pretty sure your Python and Lua apps requiredynamic loading. But your examples look quite advanced already . In principle you can do dynamic loading in Fortran too.
The R interpreter, has a mechanism for registering native routines, which helps speed up the dynamic loading process. I suppose this is similar to writing a Python extension module specific to the CPython interpreter. I’m not sure if Lua has something like this too.
This concept is sometimes called type erasure and can be used for generic programming. The canonical example is the C qsort function:
Instead of N sorts for each possible type, you provide only one.
It’s also used in other libraries, one that comes to mind is NVIDIA’s cuSPARSE Generic API. Here’s a routine as example:
cusparseStatus_t
cusparseCreateConstSpVec(cusparseConstSpVecDescr_t* spVecDescr,
int64_t size,
int64_t nnz,
const void* indices, // generic type
const void* values, // generic type
cusparseIndexType_t idxType, // enumerator for indices type
cusparseIndexBase_t idxBase,
cudaDataType valueType) // enumerator for values type
I assume that internally they cast from void * to the user-specified valueType and then dispatch to the right routine (potentially implemented using C++ templates). This helps minimize the API you expose to consumers; but at the cost of type-safety for your consumers.
Amazing! It will take me a few days to digest all this, thanks alot (i will come back with questions )!!
I always thought that “driver” routines/API should not be placed in a module, but i have basically zero arguments for that.
Yeah this is sometimes the case, for example LAMMPS provides the plugin load mechanism, or Siesta has the lua hooks through which you can access/modify the internal data, but some other codes like Quantum espresso just simply give you an empty subroutine (at least last time i checked). And the major plus point of dynamic loading is that sometimes you cannot recompile the main code (no access to the src on a cluster for example).
It would be nice to have a kind-of “centralised” API for the plugin, and then make code-specific intermediate routines, which comply with whatever is the template/needed organisation. Does that make sense?
I think i got the hang of the opaque-type idea a bit. But a question regarding the handles:
Is it possible to transfer the same handle from one program to another? In the sense, for example i would open a library in python, put some data into it, and then pass the handle to a fortran program (via printing and reading the pointer address to a file maybe?), which is launched independently, and could directly read the data from this handle? The python remaining open for this whole time of course.
The short answer I believe is no. Two separate processes, despite loading the same shared library, will get their own segment of memory. If you want to share data across processes you need inter-process communication.
… sharing class instances through a shared memory segment is usually a very bad idea . It is much safer to pass simple struct data, and reconstruct your object on the reader side (look up serialization and marshalling on the net for more information).
Correct. The data doesn’t need to be dumped to a file, it could be passed using a pipe or a message queue. In some cases you may have data which doesn’t even fit on memory, but by processing it on the fly, you are still able to perform your desired operation. There was a related issue a on inter-process communication a while ago: Fortran called by Python/Vice-Versa
In the linked Reddit post, the user “ush4” showed a nice example using MPI:
It does, absolutely. If you look at the LAMMPS plugin load mechanism, it is described in detail, what the plugin API looks like.
Taking the first example on that page, plugin writers are expected to write a function named exactly lammpsplugin_init with C linkage. The function initializes a plugin struct, that carries a pointer to a factory function (the morse2creator) which creates a polymorphic Pair instance.
This plugin interface doesn’t really support Fortran, because the lammpsplugin_t is a C++ object, hence not directly interoperable.
What you could still do is call Fortran subprograms from the member functions of the PairMorse2 C++ class (a child class of the Pair class representing a formula for computing pairwise interactions):
The worklist under consideration for Fortran 2023 - see here - has this item:
- US06. Provide a mechanism to specify global binding name for non
C-interoperable.
Ref. 23-201 "F202Y Global binding name for non C-interoperable"
I am quite pleased with the J3 and WG5 committee that this item got considered quickly and got introduced into the list for Fortran 2023. To the best of my understanding, it was merely an inquiry email from a J3 rep from NASA earlier this year and in a few quick responses, this item got added with no paper, no proposal, nothing - amazing!
Anyways, if done well, I expect this work item will enhance the library development and deployment experience for Fortranners in the form of shared libraries (aka DLLs on Windows OS).
I can then envision one day a Fortranner authoring the BestAPIEver as follows :
subroutine BestAPIEver( EternalWisdom ) bind(name="BestAPIEver")
string_t, intent(inout) :: EternalWisdom
EternalWisdom = "Fortran is the lingua franca of computing!"
end subroutine
and others scrambling to develop wrappers in their interpreted babbles to this API!
I’m trying to understand the behaviour of bind(C) which I don’t really get when applied to functions or variables.
When I compile a static library, functions and variables with bind(C) will be global (symbol types T and D) even if public is not used.
However, when I create a shared library from the static one:
functions with bind(C) will be global (symbol type T) even I did not use public.
variables (might be protected) will be local (symbol type d) except if I use public.
Did you have a similar experience? I’m really interested to understand why is it working this way?
N.B: Very often, I create shared libraries from the static libraries generated by fpm. I would like to avoid public for C API functions and variables to avoid polluting the Fortran side but still make C API functions and variables global.