Does anybody know of any Fortran interface to the Apache arrow (Apache Arrow — Apache Arrow v11.0.0) project? I am interested in any Fortran tool that allows reading/writing parquet files in particular, if any exists.
In this technical report from NASA on cloud optimised formats parquet is rated as a good complement to cloud optimised geotiff for handling Earth observation data in the cloud. On page 14 they say that of the 5 languages they consider (Python, R, C, C++, Fortran) Fortran is the only one that doesn’t have a parquet library but the C library can be used with C interoperability.
While I’d like to look into it and see if writing the interfaces is within my reach I don’t have the right knowledge to get started. Would the group here have any advice on how to get started with something like this?
I gather apache arrow is c++ and the c bindings depend on glib (gnome?). An option is fortran bindings to the duckdb c api that has parquet read/write. Extending the existing fortran bindings for the parquet sqlite extension is another option I guess.
Thanks for you reply @freevryheid . Indeed arrow is a C++ project and you are right they have C libraries through Glib that provide access to the project from various languages. Additionally arrow does a lot more than just handling parquet so maybe it’s not the best choice for a dependency?
With the additional digging I have done in the last few days I have found this documentation for the C (Glib) API: Apache Arrow GLib (C) | Apache Arrow which also addresses the parquet specifically. My plan based on what I’ve gathered so far was to explore if that can be used directly from Fortran. Up to now I have managed to install arrow with one of the distributed builds and take an example from their repository for a C++ parquet roundtrip: arrow/reader_writer.cc at main · apache/arrow · GitHub, link that against the library and run it. That was relatively easy.
Next I wanted to try to read/write that same parquet in C with the Glib (C) and then try from Fortran. Maybe along the way I’ll find some blocker.
I should also look into the duckdb and sqlite options you mentioned thanks. I am not familiar with either project. I see duckdb is also a C++ project, but I guess will expose a C API that allows binding only the parquet read/write functionality.
Extending the existing fortran bindings for the parquet sqlite extension is another option I guess.
I didn’t easily find a reference to these. Could you point me in the right direction?
I’ve included links in my original post.
Since we already have sqlite bindings for fortran you could read parquet files through the extension. To write I’m guessing you’ll need to export/convert from csv.
Started working on duckdb fortran bindings - just the basics to run SQL queries, which can be used to read/write parquet files. I’ll flesh this out as I find time but we could work together on this if you’d like.
awesome. yes I’d definitively like to contribute. Will take a look at the repository in a bit.