Registry backend design

We are now working on the initial design for the registry backend and looking for input in an open discussion round. Especially, we are looking for people with experience in backend development to discuss some of the more technical aspects.

Some aspects to consider in the backend (not complete, feel free to edit):

  • storage of packages and metadata
    • tarball storage
    • database (mongodb?, sqlite?)
  • upload of package
    • packages should be namespaced (user/package)
    • meta data extraction from manifest
    • authentication of user for namespace selection
  • query for packages
  • download of package

There is also a new repository where we will discuss more concrete tasks in the issue tracker:


I would leave the decision on the exact date next week up to Minh @minhdao and Henil @henilp105.

6 Likes

Thanks for this. I’ll only comment on the first item for now.

Compressed tarballs sound good. I’d do exactly that. I assume that the package data will be stored to an S3-like object-store. If yes, this will allow easy replication/backup and CDN access on the provider end without any extra care needed on the code side.

For the DB choice, the question to ask is whether the data structure is more like a set of unstructured documents (document-based or NoSQL) or a set of structured and interrelated tables (relational or SQL). There are mature open-source solutions for either. While the data structure here does seem to me quite relational, either approach will work fine. IMO the most important factor then is the implementor’s preference and familiarity with the tool.

3 Likes

I am curious about the choice of a tarball versus a zip file; as in the past tar(1) was not standard on MSWIndows. Will tar(1) be required on the client side or just the repository side?

That leads to the more general question of just what infrastructure will be required for a local
repository. I am picturing a local repository as expanded packages in some directory called
$SOMEWHERE/fpmrepo/ or something like that for simplicity, but perhaps not(?).

Somewhere in there should be a checksum and UUID of some form or another; which is perhaps implied but I think is worth calling out, as there are several possibilities there.

I am wondering what the policy will be on submitting packages. Must they be under a particular license? Will a submitter be able/allowed to retract them? and so on … Any rules decided on yet?

Perhaps some of this is already noted on the site referenced above, so I guess I should look there before continuing a list here … just some first impressions on the list.

Along the lines of extracting the manifest data; it would be nice if the final product allowed me to query a package and it would show the manifest data and optionally recursively the data for any external dependencies. I was just playing around with some ideas related to that but did not get much response, so I am eager to see what is emerging here. Great project!

Just for info, I currently make a pseudo-local registry by greating a packages/ directory and pulling packages into it. Then for a project I want to only use local packages I make a soft link to that directory and using a kludge script mv the fpm.toml file to fpm.toml.0 and make a new one
that changes the dependencies to local ones. It works well enough for projects that are mostly stand-alone or one level deep but obviously I would want some option to look for dependencies locally first without having to change the fpm.toml file; and currently just using the links it ignores the versions; because I do not think the current local pathname supports anything but a directory name. Keep meaning to see if other git options for using ssh and local git repositories works (I think it should) but that simpler method has worked for me so far; but i am assuming some of those issues will be solved by local repositories. Curious if a description of how that works has
been created anywhere.

@milancurcic sir, Thanks for the aws s3 objects idea, I have added a new field so that we can easily backup/cache up the tarballs. I have updated the collections and also opened a new issues discussing the API Routes and functionality.

I would like to point an alternative way for the registry: the chicken scheme way to distribute eggs.

Initially, they had a centralised svn repository for eggs (they were keeping the code source centralized). They moved to distributed remote repositories.

Basically, the central registry only associate the egg name to an url (which is under the control of the user). It is release-info file, with a specific format. It contains:

  • the release list
  • a pattern for downloading a tarball of specific release
  • an optional repository url

release-info could also be more complex.

release-info file example
(repo git "git://example.com/{egg-name}.git")   ; optional

(uri targz "http://example.com/{egg-name}/releases/{chicken-release}/{egg-name}-{egg-release}.tar.gz")
(release "0.1")
(release "0.2")
(release "1.0")

The release-info contains information to how to download a specific release tarball.

This tarball (which is also under the control of the user) is expected to contains an egg description file with the informations to build the extension (license information, author, dependencies list, compilation options, etc…) - this file has the same purpose than fpm.toml.

egg description file example
;; hello.egg
((author "Me")
 (synopsis "A cool hello-world library")
 (license "GPLv3")
 (components (extension hello)))

It means that the registry at chicken scheme is only tracking of release-info file (see the egg-locations file - SVN repository, username anonymous, no password).

chicken scheme has infrastructure to fetch and check eggs, but it isn’t strictly necessary. It is used, for example, to populate the documentation server.

eggs addition is done on chicken mailing-list. A new release for a specific egg doesn’t require interaction with central registry (it is a new line in release-info file, which is under user control).

Some documentation reference:

Could a registry be a git repo? That would have advantages:

  • fpm already supports some git
  • if well structured, using multiple registry would boild down to fetching more than one repo
  • git repos can be both local or online
  • no need to interface to databases (but fortran has good support for SQLite)

This is not a new idea: it would be copying what MSYS does, see e.g. here:

1 Like

This are the package recipes, no artifacts are contained. The MSYS packages are hosted on a separate server.

Conda-forge for a while had a backup storage of their packages in a GitHub repo where they uploaded the artifacts on release tags. The repo was taken down at ~100k tags after they broke some of the GitHub infrastructure.

1 Like

For the registry Backend Design meeting, we have decided to open a doodle to decide the most suitable time for most people.

Please enter your suitable time for the meeting before end of this week.

Thanks and Regards,
Henil Panchal
CC @awvwgk @minhdao @certik @milancurcic @everythingfunctional @gnikit @ivanpribec @zoziha

1 Like

Our next backend registry design meeting will be on Thursday, December 8 at 18:30 UTC .

10:30 - 11:30 PST (California)
18:30 - 19:30 GMT (London)
19:30 - 20:30 CET (Amsterdam)

Don’t hesitate to propose your items below.

1 Like

Will you post a meeting link?

2 Likes

Here is the meeting info:

Topic: fpm Registry Backend Design meeting
Time: Dec 9, 2022 12:00 AM India , UTC 18:30 Thursday 8th

Join Zoom Meeting : Launch Meeting

Meeting ID: 845 0700 3742
Passcode: fortran

10:30 - 11:30 PST (California)
18:30 - 19:30 GMT (London)
19:30 - 20:30 CET (Amsterdam)

2 Likes

Is there a better way of receiving the agreed upon time (i.e. via email or a Discourse notification). The way it’s setup currently we receive no notifications so it would be trivially easy to miss the meeting.

1 Like

@gnikit sure sir, I will send a mail to the entire fpm leadership team.

Thanks and Regards,
Henil

1 Like

Thanks @henilp105 I appreciate it.

1 Like

The current agenda is:

  1. Review of the APIs and their format ( API Routes for website and local registry/fpm · Issue #5 · fortran-lang/registry · GitHub ) , Database Scheme ( Database format for registry ( Mongodb ) · Issue #1 · fortran-lang/registry · GitHub )
  2. Review of PR to initialise the registry and to add docker containers of flask, mongo , nginx .(fpm registry initialise by henilp105 · Pull Request #3 · fortran-lang/registry · GitHub)
  3. Integration of APIs with fpm and frontend.
  4. Considering Keeping Mongodb as a Stateful container on AWS or using the MongoDB Atlas. ( the former is slightly costly and difficult to manage and second option is cheap and easy but would cause some latency.)
  5. Resolving the latest tag of packages and Some doubts regarding namespaces and releases .
1 Like

I have replaced the link in the above description. as the link has been provided by @awvwgk sir, I cannot start the meeting. I have added a new link. @everythingfunctional @minhdao I have restarted the meeting please rejoin.

Apologies, I am having network problems so I can’t rejoin, but could you guys @henilp105, @minhdao share some of the design decisions you are struggling, prioritised from most to least important, for them to be discussed in this thread in preparation for the next catch-up meeting?

These are the ones I got from the meeting:

  1. If the backend should impose constraining the versions of uploaded packages to avoid new dependency releases breaking the package (e.g. requirements.txt in Python)
  2. Whether version resolution should be done on the backend.
3 Likes
  1. We are still wondering what releases really are and if each release ends up being a different tarball. Because if so, we’d need to take that into consideration in our database scheme. This was brought up by @milancurcic Database format for registry ( Mongodb ) · Issue #1 · fortran-lang/registry · GitHub

  2. How and where to define a custom registry. We decided we’ll do it globally with a config.toml file in ~/.local/fpm.

  3. The fact that we will define user roles.

2 Likes

Yes, each release should have it’s own tarball.

2 Likes

I found a good resource regarding version constraints rust have implemented, they have done a very good manegement of dependencies Please refer : crates.io/ARCHITECTURE.md at master · rust-lang/crates.io · GitHub

I am currently referring and also they have also considered dependency resolution on the backend side.

Here is the meeting info:

Topic: fpm Registry Backend Design meeting
Time: Dec 16, 2022 12:00 AM India , UTC 18:30 Thursday 15th

Join Zoom Meeting : Launch Meeting

Meeting ID: 845 0700 3742
Passcode: fortran

10:30 - 11:30 PST (California)
18:30 - 19:30 GMT (London)
19:30 - 20:30 CET (Amsterdam)

As there is a 40 min time limit in zoom , I would be restarting the meeting please rejoin.