Idea: Splitting stdlib into smaller separate libraries

I think we could easily split stdlib into a set of smaller individual libraries, such as: linalg, io, stats, ascii, etc., then make stdlib depend on those using fpm, so for end users nothing would change. But in addition, they could also just depend on the individual library, thus speeding up their build.

For distribution using non-fpm, we have to improve fpm anyway to create a tarball that contains all the dependencies, and make this work with cmake and meson, so it shouldn’t add any hassle to end users who just want to use stdlib and not worry about these “subprojects”.

7 Likes

I think it’s worth considering, but there is some downside. It makes the logistics of maintaining the library and its documentation a bit more complex.

I also think in the (possibly distant) future, the library’s size won’t really be an issue. The “code bloat” from use of fypp for rank and type generic code will be alleviated by the addition of a generics facility in the language. And if we add a feature to fpm that prunes unused modules (i.e. if your project doesn’t ultimately use a module, it doesn’t get compiled or linked in), that will remove any additional hesitancy with regards to projects not needing to use the entire library.

Both of those are still pretty open ended questions and may not come to fruition for a decade or more, so maybe it would make sense to split stdlib up until then. I’m curious to hear what others think too.

3 Likes

I like the spirit of this idea, but I think it should be solved by the build system, not the library. For example, tree-shaking.

5 Likes

Yes, after I posted this idea, I got couple other ideas:

  • fpm should not compile a module if it is not used by the final binary (if a given package creates a library, then it’s a different matter). This is obvious and should be implemented anyway. This might possibly completely fix the issue of “long compile times” and “big stdlib library”
  • possibly we can turn on and off parts of a package in fpm.toml.

I think let’s just focus on fpm, I think it can fix this.

5 Likes

Very broadly speaking I think one can divide stdlib into two parts:

  • The “math” part, like statistics, linear algebra, etc.
  • The “systems programming” part, like sorting, logging, string handling etc.

In my honest opinion those two categories don’t belong in the same library so splitting them apart would make very much sense to me.

9 Likes

With that change the final user will be faced with needing multiple libraries. It will make installs more complicated, especially as library sets get out of sync as one library is updated while another is not. It will discourage system administrators from simply installing the full library for all users. Instead they will install it piece by piece, so users will not be able to access the entire library by default.

1 Like

Depending on multiple libraries isn’t a problem with a competent package manager.

2 Likes

I don’t think this is a good idea anyways. Like @oscardssmith says, this should be left to the package manager. FPM is of course one option. For projects using CMake, CMake Package Manager is probably the easiest though others like Conan exist as well. I have no experience with project built by Meson so others would have to comment on that particular use case.

EDIT: To elaborate a bit more, if one want to install a pre-built Fortran library on a system one would have to install one version for each compiler that users may want to use on the system. The reason for this is that name mangling and .mod-file formats are not consistent across different compiler vendors. Especially .mod-file formats can change between compiler versions as well which makes it even harder to maintain a shared installation.

The alternative is to “install” the source code itself, but there is hardly any benefit in doing this at all. A better solution is to let each project download its particular version of the source code and build it along with its source code. This makes it easier to manage different library versions in different projects as well.

3 Likes

I hope one day it’ll become something full-fledged like boost. Even such a large project is delivered as a whole, but user can selectively compile.

4 Likes

I think we can robustly deliver stdlib in parts using CMake. I was thinking about implementing this a while ago, but decided against it, to avoid disturbing the repo with the required reorganization in the directory structure. If there is interest I can try to setup a prototype.

However, transferring this structure to fpm is currently not possible due to Problems declearing local (path) dependencies · Issue #605 · fortran-lang/fpm · GitHub. This is a bug in fpm resulting from saving the dependencies tree in a flattened representation and therefore losing the connections between the dependencies, which would identify the parent root directory. I did promise to look into the dependency tree representation at some point (since it is my messy implementation anyway), but didn’t really found time for this yet.

4 Likes

Whatever the consensus will be, people should, ideally, still be able to compile the stdlib as a whole library if they want so. But frequently, most people need an isolated library module (or even less than the module, just one specific procedure, but that would be too fine-grained and impractical to maintain). The Boost library mentioned by yizhang looks nice and near the ideal that I had in mind, although, I have never tried installing Boost or parts of it.

3 Likes

Distribution of Boost works slightly differently. Most of the Boost C++ libraries are header-only. It suffices to place them in a folder on the include path of the compiler and that’s it. The downside is potentially longer compile times due to whole text preprocessing for header inclusion and template specialization.

4 Likes

As @zoziha mentions
in Who is using stdlib? - #8 by zoziha
, you can use stdlib as an external library and there is already
a version of stdlib that breaks it into subsections by @degawa
at GitHub - degawa/stdlib_modules: Provides each Fortran stdlib module as an individual fpm project that explores how this
might work. I have maintained a large library at
GitHub - urbanjost/general-purpose-fortran: General Purpose Fortran Cooperative that I generally
also make major components of into individual fpm repositories so I
can say there are pros and cons.

  • being able to build and use individual modules allows for making
    individual releases of such with their own documentation and testing
    which is particularly appealing when the overall package is frequently
    changing as something like stdlib is/will be doing.

  • it can create a large amount of duplication if you want the modules
    to truly be independent packages, or you have to require a build to
    build other packages, risking descending into “dependency hell”.

  • build times are minimized

and so on. Some of the other possible solutions to selectively build
only routines used works better with non-modularized procedures but does
not buy much with modules without something quite elaborate in place to
only build sections of modules.

There are a lot of arguments along the way but my general conclusion is
binary releases for several major compilers as agnostic packages (no CMake,
fpm, make, …) covers a lot of users and usage as most users are not using
more than a few compilers (often one).

Then, a expanded source version, possibly built for each compiler, possibly
still containing “almost defacto” fpp preprocessing directives. A makefile
for the release would be preferred.

For those who want to build themselves or capture and incorporate the
code I think focusing on fpm is the better alternative, but it needs
to be able to use a pre-built fpm package more easily without having to
install the components as a conventional external library, and needs the
profile for a “shared” build to be able to be specified. I picture this
as an area you specify for fpm packages that you can point to and your
builds will use the “standard” build profile for a compiler to build
it once and other packages can then link to that build in the project’s
build directory in a read-only mode, with an option in the fpm.toml file
to use a specific profile build.

Ultimately, something stdlib-compatible will be supplied with compiler
distributions, but these are some ideas on how it could evolve in the
mean time.

In general I think a lot of users just want to be able to use stdlib as a “product” and install compiler-specific versions in subdirectories they can point to; others want the source free of having to pull down a lot of infrastructure to incorporate in their code or to examine and extract routines from without learning about github and git and fypp and …; and the last major group wants an efficient way to maintain large projects in fpm.

2 Likes

that’s exactly @oscardssmith 's point: delivery can be orthogonal to builder/package manager (include or install, makefile or fpm, shouldn’t matter).

The idea would be to have a stdlib superproject which makes the individual modules available, while you can build each of the subprojects as individual library to keep your dependency footprint small. I think rustc uses such a structure via Cargo for example.

The trick is to create a setup which doesn’t change the default complete stdlib usage and still allow usage of individual modules without much effort. This is especially important if you build stdlib on demand in your build system (fpm, CMake, …) but less of an issue if you pull in an installation from a package manager.

Making the manual makefile this modular is beyond hope IMO (I still would prefer to just drop support for it).

3 Likes

I am fine dropping manual makefile. Why don’t we wait until we add a Make backend to fpm, then we can drop it.

2 Likes

At this moment, is it possible to build only parts of stdlib, by feeding in CMake options? I just need the stdlib_quadrature module, but I am currently building everything else which takes a while.

1 Like

No, I haven’t invested time into this yet. Also, the manual Makefile is still around in stdlib and on the fpm side there is still a blocking issue.

Waiting as suggested in this thread will only resolve this in case somebody is working on the relevant issues and projects. I’m busy with my studies and wrapping up my thesis at the moment (at the moment as in for the next months), therefore I won’t be of much help moving this forward, so don’t wait for me. In case you are interested in working on one of the relevant issues (fixing the local dependencies in fpm, pruning the Makefile from stdlib, splitting the CMake build in subprojects, …), I’d however be happy to provide pointers.

1 Like