Making Fortran projects easier to discover

Having modern Fortran code available and easily discoverable for common tasks and problem domains is a perennial goal. An underused feature of GitHub is the topic list in the About section. Many repos have no topics listed. It should be possible to search

language:fortran topic:numerical-integration

and find relevant repos. GitHub infers the programming language from the source file suffixes in a repo, but the topic must be supplied by the repo author. There could be a community effort to define standard topics, although topics that already have many entries should be preferred. Topics are lower case and have hyphens instead of spaces. If you have a repo, please add relevant topics, and if you use a repo, consider nudging the author with an issue if no topics are listed.

The Fortran Package Index has Featured Topics, but in many cases the GitHub repo does not list that topic. Looking at my categorized repo list, here are possible topics for some categories.

Astronomy and Astrophysics projects should have astronomy or astrophysics and perhaps a specialized topic.

Benchmarks: benchmark and benchmarks

Biology and Medicine: biology, medicine, and specialized topics

Climate and Weather: climate, weather, nwp, numerical-weather-prediction, meteorology, wind

Code Tools: automatic-differentiation, preprocessor

Computational Chemistry: chemistry, computational-chemistry

Computational Fluid Dynamics: cfd, computational-fluid-dynamics, fluid-dynamics, fluid-mechanics, turbulence, navier-stokes, particle-in-cell, pic, lattice-boltzmann, aerodynamics

Earth Science: earth-science, geospatial, geophysics, geoscience, ocean, atmosphere, erosion, ionosphere, earthquake, seismology, shallow-water-equations

Economics: economics, econometrics, finance, dsge

Fast Fourier Transform: fast-fourier-transform, fft

File I/O: hdf5, json, netcdf, toml, yaml

Finite Elements: fem, finite-elements

I’ll stop here.

4 Likes

That’s a great suggestion. Would it be possible to derive topics from the GAMS taxonomy?

I would even suggest adding this to the criteria on “how to get your package listed” in the Fortran-lang package index.

Maybe one of the moderators can break this into a new Discourse thread.

For mathematical domains such as linear algebra or optimization, yes, although the GAMS classifications need updating. For example the statistics category would now include machine learning topics such as the lasso.

For scientific domains such as earth science, classification is outside the scope of GAMS.

The main GAMS classification scheme:

Class Description
A Arithmetic, error analysis
B Number theory
C Elementary and special functions
D Linear Algebra
E Interpolation
F Solution of nonlinear equations
G Optimization
H Differentiation, integration
I Differential and integral equations
J Integral transforms
K Approximation
L Statistics, probability
M Simulation, stochastic modeling
N Data handling
O Symbolic computation
P Computational geometry
Q Graphics
R Service routines
S Software development tools
Z Other

Each class is associated with key-words and divided into sub-classes, e.g. for Class F: Solution of nonlinear equations:

Keywords: Continuation, Dynamical systems, Fixed points, Homotopy, Nonlinear equations, Roots, Zeros

Subclasses:

Label Description
F1 Single equation
F2 System of equations
F3 Service routines (e.g., check user-supplied derivatives)

The classification tree goes two levels deeper, but for GitHub topics, I think already adding the first two levels would be an improvement.

1 Like

I am surprised how literal the matches are. I see no options for globbing or regular expressions,
and searches for random, pseudorandom, and pseudo-random produce different results, appearing to only show exact matches (even pluralization appears to be significant). Unless I missed it this is a bit disappointing. It does strengthen the arguments for suggesting specific keywords and rules like “never|always use plurals”. I thing it would be useful if there were a standard suggested for indicating support of the fortran fpm packaging utility.

You already started to tag the entries with a controlled vocabulary. Thus, a suggestion would be to use Library of Congress Classification down to the second level (vide infra) to complement this approach as seen e.g., in the English list of EbookFoundation/free-science-books by Eric Hellman et al.

Second level hereby means to go beyond the first character (e.g., Q about science), i.e. QA (mathematics), QB (astronomy), QC (physics), QD (chemistry), QE (geology) as seen and used to use in libraries. Thus, publications (e.g., books) about a topic related to what the program/project aims to address may serve as a reference here. One may argue if going further into next levels of the system (e.g., anything in the range of QD146 till QD197 is about inorganic chemistry, QD241 till QD441 then about organic chemistry, etc) for a primary tag demands too much effort to curate the entries (after all, in this example, these entries’ pedigree QD == chemistry remains visible).

Going further along «for each thing one place in the list», the assignment of a second, third keyword from this controlled set may be deemed suitable, like some publishers in chemistry do (example Wiley-VCH). This however possibly consumes too much work effort for too little in return, though such a catalogue-like approach is seen elsewhere (e.g., JOSS on GitHub, or CTAN’s topic cloud).

1 Like

Some GitHub projects that use “our” fpm use the fpm topic, but that topic is also used for many unrelated projects. So I suggest that the topic fortran-package-manager also be used so that Fortran codes that use fpm can easily be found with a topic search.

One can also use this search to find projects that use fpm, as @awvwgk showed me, but someone browsing GitHub may not know this.

3 Likes