Following Monday’s discussion about stdlib release and API stability, I wanted to bring up a point about API design and consistency for the stats modules (with specifics from the distributions modules). I can also create an issue on github, but decided to post here, since it’s potentially a broader discussion not strictly related to one issue.
There’s some inconsistency in passed arguments for normal and exponential distributions. Consider the pdfs:
result =
pdf_normal (x, loc, scale)
result =
pdf_exp (x, lambda)
While the normal distribution procedure uses loc
and scale
(reminiscent of scipy) as opposed to mu and sigma, the exponential distribution procdedure uses lambda
rather than scale (, i.e. beta = 1/lambda
) and is missing loc
(though I appreciate that’s not needed as often as for normal distributions).
For comparison, scipy pdf arguments are as follows:
scipy.stats.norm.pdf(x, loc, scale)
scipy.stats.expon.pdf(x, loc, scale)
This brings up a broader question of API design. If the goal is to make it feel familiar to scipy users (as touched on in this discussion), sticking to location and scale makes sense. In this case, I’d suggest changing lambda to scale in the passed arguments. On the other hand, I found that while the scale and location convention creates consistency in arguments for different distributions, they also create some confusion (some not connecting normal distribution scale to standard deviation, for example, or taking scale and location parameters as statistically identical between distributions). It’s the reason I decided to stick with the more mathematical arguments for some of my older code I’m currently reworking into a stats lib, while adopting procedure naming conventions more reminiscent of scipy.
Perhaps it’s worth implementing several options, allowing users to (for example) pass either scale/beta or lambda? If you’re a scipy user with little background in statistics, scale will be intuitive and lambda potentiall confusing; if you come into it from the other side, it may be the other way around. However, there’s always the risk of things getting too bloated then. Anyway, this may be worth discussing.