Module naming restrictions for fpm

Isn’t this restriction too demanding on already existing, long lived packages?

  • Changing module names breaks the API, for a new or rising project this is not a problem, of course, but otherwise, especially if used in a fair amount of code, this is a non starter for a library.

  • Even if a library would accept to break the API for the benefit of entering the register, the new API would need to be better or at least equivalent. Prefixing with the name of the library could easily get too verbose and clumsy. Or conversely requiring the name of the library to be just a few characters would be a serious restriction, in many cases (and again, long lived projects would not want to change name).

I see where the issue comes from and agree that module aliasing would be nasty to deal with. Probably it’s another supporting point for wanting nested modules in the language itself (so a library would just export one top level module to the public library, and the others would remain accessible through nested qualification). But that is far in the future so we cannot rely on it for now.

I suppose you might want to mangle module names internally within fpm. As far as the current state of the toml file goes, the user would be required to explicitly resolve the clash there, when declaring the external modules. Something like

external-modules = ["stdlib/sorting", "alternative_lib/sorting"] 

or

external-modules = ["sorting@stdlib", "sorting@alternative_lib"] 

Whatever is easier to parse. Then the difficult part would be that in the source you have just use sorting and fpm would need to disambiguate by intersecting the info provided in the toml with… probably filesystem information. The build tree is already well organized in that sense, each dependency has its own subtree and I believe fpm should be able to provide the compiler with different include paths, for different source files (I know it cannot right know). Some residual limitation would be that you could not use the two clashing modules in the same source file but well, that’s a language level problem that fpm cannot solve by itself.

It’s not too much related, but since we are discussing this: maybe requiring the author of a fpm package to make an account on the centralized registry website (reasonable imho, even comfortable for advertising own libraries…) would allow to even manage in a similar way clashing package names.

Like

[dependencies] 
fortran-lang/stdlib={...} 
someone-else/stdlib={...} 

Thus solving two major problems (racing for name reservation could easily become one) with a somewhat unified logic, at least from the user perspective.

1 Like

This is true, and I’m somewhat sympathetic to it, but the question is how many libraries:

  1. Use fpm
  2. expect consumers to use more than one module
  3. would be difficult and confusing to change the API (including for users)
  4. and would be unwilling to change the API to be added to the registry

I could be wrong, but I suspect all those qualifiers make the list relatively small.

I actually think it would be entirely impossible to disambiguate conflicting module names in the general case. Things might work out okay as-is if the conflict was sufficiently indirect. For example I use packages A+B, A uses C, and B uses D, and C and D happen to have modules with the same name, so long as the modules don’t have any identically named public entities it might work. But as soon as A and B have modules with the same name, there’s no way to disambiguate for the current project.

We’re still open to discussion about this point, but it was something we initially considered (and briefly did) enforcing through fpm itself (i.e. it wouldn’t build the project if you didn’t follow this convention). We removed it because migrating existing projects to fpm would have been way too difficult.

We might be overly cautious here, but I think the chances of multiple packages ending up with a utils module is pretty high.

Requiring all modules to be prefixed with the name of the package effectively fixes clashing modules, which otherwise would be quite hard to handle. Imagine a deep dependency adds a module called “types” and some of your other dependencies already has such a module. Then you cannot use those two dependencies together anymore and there is no easy and quick fix. So having this requirement fixes this problem.

(We even had a stricter requirement for the module name to have all the directory structure as prefixes, but others felt this was too strict.)

Oh I see, didn’t thought about higher order events like this. Damn. Surely what I sketched above would (could, possibly) work only if the people writing the toml for a given project are aware of what other modules would coexist. While this can/should be true for the project you are writing the toml for, you cannot know what other dependencies a project that depends on you would add. Damn. I see, this is extremely delicate. Possibly doomed.

This I don’t know. But if you modify the first requirement from “uses fpm” to “would want to use fpm in the future” (which I think is the actual relevant thing, right? It’s not like everyone is on fpm build already and we just want to move those fpm projects into a registry, we want the registry to be another reason to move to fpm), then it might be less small than you think.

I will give you an example now, but first I feel I need a clarification: what’s the true meaning of this (wrt to the requirement we are debating?

I think in many cases you would prefer your users to consume just a “top level” module, but then, how you do it in fortran? You can document only that and try to hide that there are many other public modules in the library to the end user but to my understanding there is not really a way to have just one module, except if you structure your project to be only a single module. Submodules (alone) are not a good way to structure a huge library, having all the interfaces in one single module is nasty, a nightmare, for something big. As said before there is not a notion of nested modules, so you cannot have that all the separated modules you use for development and organizing the library would be just “children” of a single one.

What would really enforce the registry? Even if I intend just “lib_name_m” to be used in a huge library I would still have half a dozen modules (at the very least) in the source directory tree, and they would not be named according to the convention.

Oh, maybe now I get, you claim: if I do not intend those modules to be part of the API then I can rename them. But they are public, de facto, in the built static library (and usually pretty discoverable to anyone working with fortran). They are all included in the FORD output, for example.

Can I really assume the users would not be “using” them?

Anyway, the example: SciFortran is quite big, according to FORD there are 35 modules. The git repository has been around for 10 years now (I suspect it came from before but not sure) and from the same age it has served as the base library for various many body physics codes in my research group. Basically every code present in that github organization (and some other in personal accounts) has SciFortran has a dependency.

There is indeed a top level module named SciFor (unfortunate I know, if named SciFortran it would already be compliant…) and that’s is normally used in end user code (model specific driver programs). But in all the number crunching libraries we do not, almost ever, import SciFor as it is. We always use SF_LINALG, only: ... or SF_OPTIMIZE, only: ..., etcetera. As you can see we are already prefixing (almost) every module, since it really makes sense of course, given that Fortran has not namespaces. But with a shorter SF, which is indeed quite more affordable, verbosity-wise. .

Similarly the library taking care of exact diagonalization has modules prefixed by ED, etc.

I do actually intend to slowly move all of our codebase to fpm for many reasons, the most important one, in my mind paramount, being to enable true version-safe dependency management. We chain a lot of libraries and I want each driver to have a toml file recording the exact versions of everything, as used for producing the data ended in the associated paper(s). This is crucial for reproducibility of scientific results.

I (we, the group) might not want or need to register everything as an official fpm package, of course, but SciFortran is surely a candidate for that, for its scope (much broader than many body physics on lattice models) and probably also looking at github stars (as a measure of community interest).

Moving all the SF_* modules to SciFortran_* would imply a huge, heavy and tedious work on all our codebades so no, I don’t see that happening. I would surely not do that alone and I have zero hope to get the whole team focus on that anywhere in the foreseeable future. No matter how well I could praise the wonders of fpm, once I mention that, my PI would probably pose a definitive veto on the whole idea.

We’d probably still use fpm as a tool for us, but not really participating to the broader ecosystem I guess.

1 Like

How does Rust (or cargo) handle this issue? It is excluded already at the language level? (like I understand to be the case for pip, given that every python library consists of a unique namespace, with nested namespaces of course, if it is big enough to require a modular design)

Mmm, I’ll try to iterate my proposal.

What about allowing the library authors to provide fpm with a (single) alias for the library name? So every module, to pass the check, would need to be prefixed by either the full library name or the (shorter) alias. That way SciFor (guess we need to drop tran) and SF_* would be both fine, other modules we have (de facto those that we never really include anywhere outside of scifortran, those that we treat as children) that do not respect this weaker requirement, we would need to change. It would still be technically a breaking of the API, but much less painfull, I can assure. I would settle for this.

Of course this could very well be overfitted to my personal concerns and needs, and being vastly insufficient or even annoying for others. Let’s discuss it.

One possibly important drawback that I can already see is that every package added to the register does not reserve just one name, but two: the full and the short alias. And the options for the latter are of course much more limited. SF would be appealing for a “Structued Finance” library or who knows what else. Even worse, ED would be the preferred alias for any exact diagonalization library in the world (we have more than one already :smiley:). This is the core of the issue in tightly binding module names and official fpm project names: you want the first to be reasonably short in your code, you need the second to be original, distinguished and inevitably longer, to avoid “overbookings” in the registry. I don’t know how to solve.

EDIT: let me add that this is brilliantly solved in python lands with the “import as” semantics… numpy as np, pandas as pd, matplotlib as plt, etc. So there must be something about that, right?

You have brought up many salient points. And building a language ecosystem is as much a social challenge as it is a technical challenge. We clearly need to have more discussions on this topic and think harder about it.

Just a thought, could fpm allow a package to designate public vs private modules? Then only public modules need be prefixed with the project name, and a project may only use modules that are public from its dependencies. I think there’s still consequences for name conflicts in certain cases, but maybe would be a viable compromise.

2 Likes

I also thought about this possibility, while writing my replies. I ended up deleting the paragraph cause I felt I need to meditate about it in more depth. It may be a brilliant solution (and somehow fill a gap in the language itself, regarding encapsulation in module style programming). But going beyond language semantics may hide problems too, like confusing people that look at the code (or ford output, or simar material) and then see different things happening in their own code, which depends on the package. (let’s assume they know very well Fortran and its rules, but are much less versed with fpm manifest and conventions). Especially at this early stage, where fpm is a new tool, an unknown one for many, we should allow users to have less than a solid understanding of how it works: generating confusion for them, could be harmful for fpm’s perception.

Let me add a much needed clarification: of course a library developer that is registering a package is totally expected to know well the associated build system. Know your tools. Rather, I’m a bit concerned about users that add dependencies in a toml just for a quick project, to not be published as a package / library / anything, i.e. exactly the kind of exploratory work that fpm aims to enable with fortran, to “”“predate”“” over R/Matlab/Python/Julia market share. It’s not the only goal (I believe, again I see a lot of value on version management for library and ecosystem developers), but it’s indeed an important one.

I think all modules in Fortran are global, so they can’t share a name, even if they otherwise would not collide.

Choosing a prefix could possibly work too, but it’s more fragile. I would recommend to sticking to the package name, which fpm ensures is unique.

We are literally implementing nested modules right now in LFortran, as we need it for LPython. So the internals will have full support for it. We can iterate on some nice surface level syntax for Fortran. So one approach for larger codes that you do not want to adapt is to do the renaming in the compiler such as LFortran, or you can just use LFortran as a “preprocessor” that automatically renames modules, generates new temporary files, you can use other Fortran compilers to do the actual compilations, via fpm. So there are possibilities that we can do later.

For now I recommend to sticking to just the name prefix, and packages that cannot be updated can perhaps be supported with some extra compiler options saying you are on your own with regards to module names. But I think it’s not a good idea to put them into a central registry, as I can’t see any path forward how this can be a manageable problem once we have thousands of packages.

What would happen to packages already registered (though is not centralized yet, not in the storage sense at least) when the enforcing would be activated?

Examples:

  • neural-fortran is registered exactly as this (this is the name appearing in the toml, this is the name you need to put in your toml to get it as a dependency). Though all the modules are prefixed (unsurprisingly) with nf. Is it even legal to have dashes in module names? (sorry I’m accustomed to python and matlab where is not, so I never even tried in Fortran and Google is not helping right now).

  • mctc-gcp and mctc-lib, are registered with exactly these names as far as the toml goes, but their modules are prefixed, respectively, with gcp and mctc.

While I believe it would be fair to rename the toml entries for these two, I don’t think @milancurcic would want to rename neural-fortran to nf, as a package. Maybe neural_fortran and then prefix that way the modules, I don’t know (of course).

Would them (and other instances that maybe also exist – didn’t parse the whole registry EDIT: let me mention at least toml-f and test-drive too, dependencies respectively of fpm itself and stdlib, whose prefixes are tomlf and testdrive) be forced to comply after the update? I imagine yes since at the time of uploading a new version the check should happen…

If not we probably just need to hurry up and that’s it (joking, more or less) :slight_smile:

@everythingfunctional is fpm not enforcing the convention anymore? Indeed, it is not being enforced here: neural-fortran/fpm.toml at 9bbd70f27b5bbc5b3122cc88a66a726bd0e32339 · modern-fortran/neural-fortran · GitHub and here: neural-fortran/nf_base_layer.f90 at 9bbd70f27b5bbc5b3122cc88a66a726bd0e32339 · modern-fortran/neural-fortran · GitHub.

@rgba indeed what you explained is precisely why one can only loosen restrictions, but not impose them, because it breaks projects. Consequently, one must start from strict restrictions and then relax them over time as one sees fit.

I do not think it is too late, but if fpm is indeed not enforcing this, that’s a major mistake that we should discuss and fix one way or the other very soon. If we are now allowing any prefix (?) or no prefix, we are in big trouble. Sooner or later there will be a package numpy-fortran that will also choose to use the nf prefix (why not?) and it will have a module called nf_layer for numpy-fortran layer. Then I have an application that wants to do some machine learning and I will call neural-fortran as well as numpy-fortran (for Python interoperability), and everything will completely break, because neural-fortran also has an nf_layer module. I do not think this is far stretched, this is approaching almost certainty if we become successful (and that is the goal!) and have thousands of packages. This can be prevented by a simple convention that fpm enforces.

Simply cause neural-fortran was registered before and at this point I think you should really reserve the prefix too. I find your example very realistic indeed.

Here I’m a little perplexed, I thought this enforcing was meant for registered packages only, not by the build system itself. I thought the implication was “with registered stuff you know you are safe, if you put a custom url for a git repository instead, you better know very well what you are doing and what that dependency is doing too”… basically you’d do that only if you have full control on the whole unregistered stack you are putting in the dependencies.

If, as now it sounds, you mean to ask if fpm refuses to build packages that do not conform to the prefix convention, then I think I can already answer: no, it doesn’t.

Look here: GitHub - QcmPlab/HoneyTools: Honeycomb flakes, sheets, ribbons, whatever. Made easy., in the toml the name is HoneyTools, the modules are named in total freedom (and not cause I’m a rebel, I never found any mention of this rule in the docs and indeed it’s working, that’s why I’m suddenly alarmed here). All the planning I had carefully evaluated for SciFortran and downstream stuff was based on the understanding that module naming conventions are unconstrained.

I see. I personally think is is a mistake.

I really thought it was enforced. :frowning:

Well, we can always just enforce it for the registry, but often you want to depend on just some repository directly and know that there are no collisions. We can also ensure that fpm checks for collisions, and I do think that they hopefully be relatively rare, and for successful packages they will be prefixed, so say neural-fortran is famous, so it uses the nf prefix, an then everybody else will be “forced” to use a different one to avoid collisions.

Also I can imagine use cases of having different package name, but the same prefix and similar source code, perhaps a fork or different alternatives and you should use one or the other.

Note that other communities decided not to have a central registry and there are indeed many issues with a central registry. See e.g.: Zig Is Self-Hosted Now, What's Next? | Loris Cro's Blog and “The package manager will not assume the presence of a central package index. We don’t plan to create an official package index.”

There are many advantages to enforce these conventions at the tooling level, start very strict, and then slowly relax. Not at the registry level.

Regarding separating the prefix and the package name, I guess that can work too. One can treat the nf prefix in neural-fortran as the de-facto name of the package, which has to be reserved.

Thanks for starting the discussion on this, I thought we had it fixed, but I can now see this is not fixed at all. We need to fix this.

One option would be to require the namespacing in fpm, but allow packages to opt-out:

[fpm]
module-namespace = false

or set their custom namespace

[fpm]
module-namespace = "nf"

When publishing to the fpm registry we can set requirements on the namespace, e.g. having a namespace or requiring a unique namespace. To avoid surprises when publishing to the registry we can provide a command like fpm lint to point out potential blockers (like twine check for PyPI but actually checking all requirements not just a subset).


Note that a namespace system in fpm does not protect us for name collisions in packages, imaging a package stdlib providing a module stdlib_sorting and another package called stdlib_sorting. To detect this the registry would have to track all module names provided by each package version.

5 Likes

How about this:

  • By default, module-namespace (if you don’t specify it explicitly) is the name of the package
  • You can override it with module-namespace = "nf".
  • fpm enforces this by default. Unless you do module-namespace = false.
2 Likes

Unfortunately, it does not really matter for the name clashing, whether consumers are importing a given module or not. What matters is, which names are generated by the compilers name mangling algorithm when building a given library with all its routines, public or non-public. If you have two libraries with colliding module + procedure names, it will conflict during linking, whether the collision occurs in a public or non-public part of the libraries.

Additional complication is posed by the mod-files, as some compilers have non-self-containing mod-files, meaning, that you have to have the mod-files for all modules of a library around, even if you only import the its outermost public module. And as mod-files are usually named after the module they represent, you may also end up with confliciting mod-file names.

As for the transformation: Our DFTB+ project has currently 265 modules, and a student without much programming experience was enough to regularize their names.

A very important point you raise is the naming prefix. I think, it would be probably useful to have the possibility to register/allow prefixes in fpm, which are abbreviations of the library names. We also use an abbreviated prefixes for our module names (dftbp instead of dftbplus).

2 Likes

I see, I hoped the tree-shaking feature in fpm would have helped. But the reality is that I have no idea about how it works and what can achieve.

Indeed changing module names inside SciFor and making it build is not a problem. The problem is that with SciFor that’s not how it ends. It is a dependency for a lot of other repositories, so changing its API would trigger a huge “refactoring cascade”.

Does DFTB+ enter as a build dependency in other projects? Or is it self contained and you just install it and then run a (sub)set of the produced executables?
I imagine the latter, assuming it’s similar to QuantumESPRESSO (which is also developed at my department, or at least here all has started). If QE would ever transition to fpm it would not need to actually enter the register, since very few people (if any) need it as a dependency for building their own stuff. Regular users would just continue to download the ore built binaries and people interested in contributing would still clone the repository, but then use a nice fpm workflow instead of the current CMake one, when rebuilding and testing the stuff they are writing.

I imagine that QE would either use qe as a prefix for all its modules, or (more probably imho) just go for

and maintain their very complex tree as it is.

A completely different story is how QE would embed its external components after this hypothetical fpm transition. Currently they use git submodules (which are managed by CMake for you, if I recall correctly my chat with one of the maintainers). It’s not so bad in the end, they still get commit-resolved version management, even if in a more complex way than what fpm would provide. Yes, an end user would just have not the possibility to define its own preferred combination of versions (as I would like to do with the model drivers in my research group), but have to conform to what QE maintainers define, but I think that’s good for their target. One big factor to make QE move to fpm would lie in a simplification of this submodule management (e.g. Wannier90 would move away as a git submodule and enter instead as a dependency in the toml file). But this depends on all of those (independently developed) components to move to fpm and being happy with the enforced refactorings. If it would be (too much of) a hassle for all of them, I see the likelihood of QE transitioning going lower and lower (and it’s already low, they resisted for a long long time even the Makefile to CMake transition, even if there is a clear payoff in terms of reduced hassle in doing that :upside_down_face:).

Surely, one nice aspect of a collective transition in the quantum chemistry / material science community, would be that every DFT (or lattice model!) package that wants to integrate with W90 at the build level would not need to craft (and maintain!) its own strategy, but just rely on a unique solution, as provided by fpm (which the W90 team would have to setup and maintain, sure, but at least is just them, not everyone using it).

I’m happy to hear that. If (as it appears) there are a lot of packages already using shortened prefixes, it would be easier to get that possibility supported in fpm. :slight_smile:

Here are a few prior discussions on this topic from GitHub that I could find:

I’m sure I missed some because #130 is not the first time this design was discussed. Actually it was discussed very early on. Please add if you find any others.

A few comments:

  • I agree with @certik and others on the need to enforce the convention to prevent module name clashes (nothing new here).
  • It is known that this is not implemented in fpm, but it’s a design choice that overall we all want; we may just not all agree on all details on how exactly to do it. But since it’s a big design decision, it needs time to be discussed at length.
  • Package name, as specified in the manifest, should be decoupled from the repository name. This is because branding (repository name) and user API (package/module name) designs have different incentives. You want your library to be noticed and recognized from the repo name, but you don’t want to force that possibly long name on the user in the API. For example, I want my repository name to be “datetime-fortran” for branding reasons, but I want the fpm package name to be just “datetime”.
  • I like the idea of a namespace entry in the manifest, to further decouple the package name from the top-level module name. For example, the datetime-fortran package provides the datetime type among a few others. I can’t name my module datetime and the source file datetime.f90, so I named it datetime_module in datetime_module.f90. Without the namespace in the manifest feature, datetime-fortran can’t be compliant with the fpm registry in the way we currently want it.

In summary, my recommendation is to only enforce a convention for module naming at the level where it’s needed to resolve conflicts related to the limitations of the Fortran language. That is, enforce that a module name is prefixed with the package/namespace name. This should be enforced within every fpm package, and not just those in the fpm registry, because you want to be able to use unregistered fpm packages without worry of a module name clash.

1 Like

I like this solution, with the constraint, that if false is specified, fpm should deny to combine it with any other project. So what do you think about the following:

  • If nothing is specified, fpm assumes that the module-namespace is the project name and enforces that each module name is prefixed by module-namespace string.

  • If module-namespace is a string, fpm uses that as module namespace string and enforces that each module name is prefixed by that string.

  • if module-namespace is false, fpm does nothing check anything.

When combining projects, fpm should check for collisions and stop if two projects have the same module namespace (delcared explicitely or derived it implicitely from the project name). Fpm should refuse to combine a project without module name space string (which set module-namespace explicitely to false) with anything.

This way, you could still build a non-conforming project with fpm as standalone, but you would not be able to combine it with other projects.

4 Likes