I was reminded of this topic when reviewing the j3-fortran proposal on generic programming in Fortran (use cases): correctly handling units in computations. It is a topic with a long history and - as far as I know - no definitive answer. In fact I tend to think there is no 100% proof method to deal with the issue and still use all the usual programming techniques. And that is what I would like to work out in a paper/blog/whatever - a piece of hopefully coherent text . And yes, it is a hobby horse, but a serious one.
My questions right now are:
What solutions have you seen or used?
What requirements are to be imposed?
What (fundamental) problems have to be tackled?
For starters: I know that the ideal solution would be that the compiler checks your expressions and throws an error if you attempt to add 0.2 m to 5 K. I know of a library by Grant Petty (PHYSUNITS) that deals with this sort of things at run-time. I have been experimenting with a different approach myself, geared to the typical problems I see with less formal units - grams chlorophyll per gram carbon, for instance.
So, I would like to know your ideas on this topic.
I read that paper about CamFort (I have not tried it):
Contrastin, Mistral, Andrew Rice, Matthew Danish, et Dominic Orchard. « Units-of-measure correctness in fortran programs ». Computing in Science & Engineering 18, no 1 (2016): 102â7. https://www.cl.cam.ac.uk/~acr31/pubs/contrastin-units.pdf
Here, we demonstrate how our freely available, open source tool, CamFort, provides a low-effort and automated way of detecting mismatched units-of-measure in code.
See also:
Orchard, Dominic, Andrew Rice, et Oleg Oshmyan. « Evolving Fortran Types with Inferred Units-of-Measure ». Journal of Computational Science 9 (juillet 2015): 156â62. Redirecting.
Quantities for Fortran. Make math with units more convenient.
This library provides all the functionality necessary to almost treat quantities with units associated with them as though they were just intrinsic real values. However, since a quantity has itâs own unique type, you get some compile time safety that you donât mix them up in your argument lists, and you donât have to worry about doing unit conversions or remembering what units youâve stored things in when you start doing math with them.
Turning a number into a quantity is as easy as defining what units that number is in, like 1.0d0.unit.METERS . And, if you need the number back out, just say what units you want the value in like time.in.SECONDS .
I usually have to deal with units in the context of electronic structure theory. By performing all calculations in atomic units most prefactors and constants drop from the used equations.
For the actual calculations the dimensions and rank of the quantity are usually quite descriptive, e.g. an energy (scalar, Joule â Hartee) for a cartesian geometry ([3, nat], Meters â Bohr) and its energy derivative w.r.t. displacements ([3, nat], Joule/Meter â Hartree/Bohr) go well together without extra annotations and can be easily distinguished from a charge distribution ([nsrc], Coulomb â unit charge) and its potential ([nsrc], Joule/Coulomb â Hartree/unit charge). The overall handling of those quantities feels quite natural in the actual computation.
For input quantities this can be more difficult, as they are usually given in a more human friendly unit system (Ă ngström for length, eV or kcal/mol for energies, g/L for mass densities, âŠ). Rigorously converting all input into the internal unit system and only converting it back for human facing output has worked quite well so far.
One thing I have to deal with frequently is different sign conventions, like the sign switch from energy gradients to forces (both Joule/Meter), or normalizations, like the difference between a virial and a stress tensor, where the latter is derived from the former normalized with the systems volume. Those conventions happen to be seldom documented in existing code bases or libraries and take some effort to find out by trial-and-error.
Not sure if this is strictly related to units. Quantities with cartesian componts and in spherical harmonics tend to be interesting.
The classic is the component ordering of the first moment, which can be
x (1), y (â1), z (0)
y (â1), z (0), x (1)
z (0), x (1), y (â1)
âŠ
I have seen all of the above and more in actual implementations, all have their merits and drawbacks. Not sure how this could be handled gracefully by a unit tracking tool.
This seems like something where generic programming is great; I really like the approach of defining e.g. a type Meters which is just a thin wrapper to real, but where +, -, *, / and â**â act appropriately with the units to produce other types which are also just thin wrappers to real. If done right, this doesnât change the compiled code at all, doesnât change the written code apart from at variable declarations, and throws a compile-time error if you get your units wrong.
A quick web search points to e.g. this rust library, which looks nice. Iâm hoping that if some form of generics are included in Fortran 202Y then this will become possible in Fortran too.
In my own Fortran code I have lots of types like CartesianDisplacement, NormalModeForce and PhononWavevector which are all just wrappers for real(dp), allocatable :: vector(:) arrays. But this ends up requiring absolutely tons of duplicated code, so itâs not an ideal solution.
@Arjen, I suggest reviewing the prior discussion on this topic under the notion of âreliabilityâ at comp.lang.fortran that you may recall: please see this thread.
As you may know, this is something that gets discussed every now and then in the context of statically typed and compiler-based programming languages toward scientific and technical computing, particularly Fortran and C++.
Until now there have been no solutions that are fundamentally based on the physical nature of quantities and their dimensional analysis and which are truly compile-time and that are acceptable to implementors (think commercial vendors here first) and also practitioners and which can then be integrated into the core language.
The generic programming feature, if done well in Fortran, holds the prospect in the very distant future of some progress on the more important aspect of dimensionality of physical quantities in floating-point operations and secondarily in the unit-of-measure conversions in library and user code. That is, post 2040 by when some compilers may have implemented the feature set reliably.
TLDR, Iâve spent a fair amount of time exploring this space and itâs really hard and there isnât just one right way to do it.
The fundamental trade-offs Iâve found are between flexibility and convenience versus run-time performance versus maintainability versus fidelity (precision), with some other subtle nuances thrown in. In my library I focused on flexibility and convenience and low runtime cost, at the expense of maintainability (although I have ways to mitigate some aspects of that) and fidelity.
I have a type for each kind of quantity (i.e. length, time, etc.), and operators to convert to and from real numbers given one of a set of available units. Values are stored internally in SI units, and all of the mathematical operators youâd expect are available (i.e. adding two lengths together works, dividing a length by a time gives a speed, etc.). This design maximizes compile time safety (i.e. you canât inadvertently assign a speed to an acceleration) and flexibility (i.e. you can add 1 m to 1 ft and get the answer in yards). And since I provide ways of going to and from strings, you can expose this flexibility to your users as well. Since unit conversions and tracking happen only when converting to or from a number, the run time cost of doing math is minimized.
The maintenance cost is that there are a large combination of operations between quantities that need to be supported, and a huge number of units that should be available. The other cost is possible loss of precision, where if youâre doing calculations at either end of the extreme (i.e. in light years or femtoseconds), then storing the values in meters or seconds might incur some loss of precision.
Iâve seen designs that take different approaches to balancing the costs and benefits. For example, you can reduce the loss in precision by making values in every different unit a different type, but that comes at the cost of flexibility (i.e. I canât add feet and meters any more), or maintenance (i.e. now I have to manually support all of the possible operations between all different quantities and units).
One really cool example is the Haskell library Numeric.Units.Dimensional, which uses some really advanced features of the type system to minimize maintenance costs and run-time overhead, but only supports SI units. The Rust library Dimensioned is pretty cool too.
@pmk why do you think generic programming canât handle dimensional analysis?
As far as I can tell, the Rust library I linked to does exactly that. It defines a bunch of basic unit types like Meters and Seconds, and then it has macros to define things like Meters . Seconds and Meters / Seconds. And then if you multiply something of type Meters / Seconds by something of type Seconds you get something of type Meters.
If you replace Rust macros with Fortran pre-processor I donât see why Fortran couldnât to something similar.
Well, I gladly accept that offer My idea is to examine the problem from first principles first. Then see how the requirements that result are addressed by existing solutions. My gut feeling at the moment is that there is no satisfactory solution for all requirements. But I should put that on paper before speculating too much. And of course read the material referenced upthread.
This is yet another unit library (here in Nim), which says there is no runtime overhead and supports multiplications of units etc (so might be similar to that in Rust above�)
Ah, thanks - from what I have read so far, the compile-time approaches mostly guarantee dimensional correctness. Some of the run-time approaches also allow unit conversions, such inches to centimeters and the like. I have not seen a list of requirements or wishes yet and that is what I want to focus on. BTW. Simconâs approach is an interesting one, as that tries to determine the dimensions directly from the source code.
Stefano Zaghi has written a package of routines called FURY (âFortran Units (environment) for Reliable phYsical mathâ), freely available on GitHub. I have not (yet) used it.
FYI, Brad Richardson and I are working on a paper about this topic. The purpose is to make an inventory of what use cases (or perhaps better usage patterns) there are, what they mean in terms of requirements to any programming solution and how well the existing and perhaps envisaged solutions support these requirements.