Poll: Naming a function in stdlib

Consider that you’re importing a probability density function (pdf) for a uniform (or normal, exponential, beta, gamma etc.) distribution. This function could be named pdf_uniform (kind of function first, kind of distribution second) or uniform_pdf (kind of distribution first, kind of function second). So:

use stdlib_stats_distribution_uniform, only: pdf_uniform

or

use stdlib_stats_distribution_uniform, only: uniform_pdf

It may seem like an unimportant detail, but I think details like this contribute to the overall user experience. So, which one do you like better?

  • pdf_uniform
  • uniform_pdf
  • I don’t care / Either is fine
  • I don’t like either, here’s a better name (write in comments)

0 voters

Context: https://github.com/fortran-lang/stdlib/pull/272#discussion_r711776082

I mildly prefer using the kind of function as the prefix.

In R, the prefixes [d, p, q, r] correspond to [density, distribution function, quantile function, random variate] for a distribution. So for the normal distribution there are functions [dnorm, pnorm, qnorm, rnorm]. Stdlib could just copy the names in R, although they are cryptic. Otherwise, the prefixes could be [density, dist, quantile, random]. Shorter prefixes could be [pdf, cdf, quant, ran]. SciPy uses [pdf, cdf, ppf, rvs] as attributes of the rv_continous class.

3 Likes

No preference personally aside from noting that I’d simply prefer it be consistent with how other functions in the stdlib are named. That said, I’m not familiar enough with stdlib to know which one is more consistent.

I will note that pdf has a second, ubiquitous meaning in software: portable document format. I don’t know that you will see any instances where it will be ambiguous, but it’s worth thinking about.

I guess using uniform_pdf to be consistent with SciPy’s uniform.pdf:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html

I don’t know. I would have to see all the functions. Generally I prefer if the first part is what there is less of. So if there are a lot of options for “uniform”, but just a few for “pdf”, then perhaps pdf_uniform is better. If it is the other way round, then uniform_pdf would be better?

Update: Based on @Beliavsky’s comment Poll: Naming a function in stdlib - #6 by Beliavsky I changed my vote to pdf_uniform.

There are more plausible distributions than attributes (such as pdf or cdf), so this suggests pdf_uniform. A list of distributions in the R stats package is here.

2 Likes

With a function subprogram, an approach I suggest is to preface with a verb, abbreviated as needed e.g., Calc, Get, etc. - this helps distinguish the functions from multidimensional arrays in Fortran. I also suggest doing away with the underscore separator.

Thus my recommendation for the function name here will be CalcUniformPdf.

1 Like

I think subroutine names should be verbs describing what they do and that function names should be nouns or adjectives describing what they return. So I would write call sort(x) but x = sorted(x).

A problem with camel case names in a case-insensitive language like Fortran is that some people will write y = calcuniformpdf(x) , which is less readable than y = calc_uniform_pdf(x),

4 Likes

I would add that verbs are more common for methods in OO interfaces

In my opinion, these kinds of prefixes are superfluous. They make your names longer without adding any clarity.

Omit needless words.

  • “The Elements of Style” by Strunk and White
2 Likes

Hardly. The world is full of varied and diverse views and one can easily draw up a lot of practical and valuable exceptions to “rules” that are more aspects to style!

What I wrote above was based on actual feedback from many engineers and scientists and the resultant practice in the teams I had worked with, albeit during the 1990s when considerable amount of new code using Fortran 90/95 was still being developed (alas so many codebases have since migrated away, a different story).

The rationale documented with the feedback was the verbs helped both the consumers and coders:

  • For the consumers, it helped distinguish the functions from multidimensional arrays in Fortran,
  • For the authors, it made the use of RESULT clause clearer in code e.g.,
   function CalcFoo( .. ) result( Foo )
       ..
1 Like

What hasn’t been mentioned yet is the impact of Fortran’s facility for renaming at USE. That suggests that longer names and shorter names can coexist, each in a different module.

Good point, but this has been mentioned in the GitHub discussion linked by @milancurcic. I make a point there that I believe the default, “lazy” approach using use without renaming (without the only clause) should be still safe when multiple distributions are used at the same time

2 Likes

I have been thinking about the best naming convention every now and then recently and tried many different naming conventions. The best I have settled with for now, is even more aggressive than your suggestion (and goes farther to distinguish functions from subroutines), that is, to prefix all procedures with verbs and preferably prefix subroutines with get (or other appropriate verbs) and functions with gen because functions always generate and return an object which is different from what a subroutine does (takes an existing object and transforms it). This would also resolve the naming conflicts when there is both a subroutine and a function implementation of the same procedure: getRandMVN() vs. genRandMVN(). By just looking at the names, one can instantly tell what each procedure does and whether it is a function or subroutine.

That said, the Fortran standard intrinsics follow the snake_case naming convention. So it does not look too illogical for the stdlib to follow the syle of the standard.

That’s very interesting, it took me a few times reading this to see how they are different–1 letter in the middle of the name. But I believe that they’re quite easy to distinguish once you’re used to and expect this pattern. In this specific case, I’d call them RandMVN() and genRandMVN(). I guess this is yet another one of those “as many answers as there are people.”

2 Likes

To me, a noun for a function and a verb for a subroutine become especially meaningful when I decided to use functions only for side-effect free calculations, and subroutines only for operations with side-effects. Verbs suggest an action, whereas nouns don’t.

2 Likes

simple nouns as intrinsic or stdlib functions are great. Even to me (as someone who follows CamelCase in all languages), writing getSin() or genSin() is nowhere near sin() aesthetically. But from a developer perspective, requiring a verb at the beginning of all procedures imposes a clear structure and naming convention throughout an entire library and eliminates the burden of coming up with creative names for function and subroutine returns. It also avoids potential code duplication. For example, one can write a single implementation for two procedures (functions and subroutines) that work on the same return object with the same name,

function genRandMVN() result(RandMVN)
real :: RandMVN
#include "RandMVN.inc.f90"
end function

subroutine getRandMVN(RandMVN)
real, intent(out) :: RandMVN
#include "RandMVN.inc.f90"
end subroutine

With the above naming, nothing has to change in the included file as the output objects have the same name in both procedures. All other procedures in the library would follow a similar structure with this naming convention.

2 Likes

Super …

It does take all KINDS, as they say. Perhaps because I did a lot of hand calculations where you tended to use a single-character symbol to represent a lot of operations and variables I often find multi-line formulas expressed in long descriptive names more obscure than clear, and am often happier with a comment containing keys defining the meaning of the letters more pleasing. That being said, it can be useful to have long descriptive names at times where the code almost reads as prose, and that is particularly nice for long stretches of code with short expressions. So how I like to name things depends on the context. But in general, one of the reasons I like modules is that I can create procedures with long descriptive names, but rename them easily with a USE statement when I want something shorter. So I like the proposed gen_long_name syntax for procedures in modules, and think that the USE statement is the best place to create the short name.

But just because it was a unique request, I thought I would mention what someone asked me to do – give everything a long and short name. They did not want to use “USE, ONLY :” which is what I prefer. They wanted to just load the entire module with a simple “USE, NAME”. So I do not recommend this, but tastes differ: You can give a procedure multiple names in the same module with an interface. The real case was more complicated as the vast majority of the procedures were generic and so on but the basic idea is simple to show with a trivial example:

 module M_math
implicit none
private
public :: generate_sin_of_radians, sinr
public :: generate_sin_of_degrees, sind

real,parameter :: R2D=1.0/57.2957795131 ! degrees

interface sind
   module procedure generate_sin_of_degrees
end interface sind

interface sinr
   module procedure generate_sin_of_radians
end interface sinr

contains

elemental function generate_sin_of_radians(radians)
real,intent(in) :: radians
real            :: generate_sin_of_radians
   generate_sin_of_radians=sin(radians)
end function generate_sin_of_radians

elemental function generate_sin_of_degrees(degrees)
real,intent(in) :: degrees
real            :: generate_sin_of_degrees
   generate_sin_of_degrees=sin(degrees*R2D)
end function generate_sin_of_degrees

end module M_math

program test
use M_math
real,parameter :: PI=atan(1.0)*4.0

   write(*, *)generate_sin_of_radians( [ 0.0, PI/4.0, PI/2.0, PI ] )
   write(*, *)sinr( [ 0.0, PI/4.0, PI/2.0, PI ] )

   write(*, *)generate_sin_of_degrees( [ 0.0, 45.0, 90.0, 180.0 ] )
   write(*, *)sind( [ 0.0, 45.0, 90.0, 180.0] )

end program test

So, although I don’t really like it myself you can use an interface with just one module procedure in it to give the same routine an alias when you define it in a module. In this case they wanted exactly that – a long and short name for all the procedures. I hadn’t really thought of doing that until asked.

Since then, I have found that useful when I want to casually rename a procedure but provide backward compatibility with the old name in my own non-production code. I just go ahead and rename it if I change my mind, and then make a one-line interface so the old name still works and mark the old name as deprecated.

There are other ways to do similar things; but I found it interesting the first time it came up, and it does provide an alternative – instead of deciding between long and short names, provide both in one module without writing a wrapper procedure! Sort of an obvious subcase of generic procedures once you think of it, but like I said – it had not occurred to me until someone asked for that feature.

I’ve obviously missed something here but I don’t see what the second ‘uniform’ is bringing to the party. It also seems strange to me that there is a nice long descriptive name for the module but a relatively opaque tla for the function. Is there a nonuniform_pdf in the uniform distribution library?