Unit checking has been a longstanding interest of many developers. There are hundreds of implementations of the idea, and much previous discussion in this forum. There’s a great FortranCon video by Arjen Markus on the tradeoffs of compile-time, run-time, and static analysis approaches which might be good to watch before reading this post.
Most implementations seem to be applied to toy problems and not actual production code, which makes me think that real problems that would appear in practice may not have been identified. So, I took it upon myself to write my own compile-time unit checking system for Fortran (similar to quaff) that I call genunits and use it for a new computational fluid dynamics code I am developing. Here I’d like to report some non-obvious things I’ve found in the process.
In summary: I probably will be removing unit checking from my code due to poor performance. Even with compile-time checking, there’s a 11 to 35 times slowdown with current compilers for a SOR Poisson solver. (Edit: Seems that the vast majority of the slowdown can be avoided with inlining as suggested by wspector.) The main advantage seems to be finding certain bugs sooner, not finding bugs that would be missed with rigorous testing. Given that testing has no run-time penalty, it seems to be a better choice at the moment if performance is a concern. For compile-time unit checking to be done properly, I believe it needs to be a compiler feature, preferably a standard feature.
A bit about my system: genunits will read an input file defining the desired unit system and generate custom source code for a Fortran module. Units checking is done with derived types at compile-time through defined operations. This Fortran module can be use
d, variables can be assigned units when they are defined with the appropriate type (for example, type(unitless) :: x
), and most mathematical operations work the same as before. You can see some examples of the system in action in this test file.
Why have compile-time unit checking in general?
Before I implemented genunits, I might have said that the main purpose of genunits would be to catch bugs. But now that I’ve thought about it, I don’t think that unit checking will inherently find bugs that other forms of testing can’t. The advantages of compile-time unit checking instead are that bugs are found earlier in development and often the precise location of a bug is pinpointed with the compiler error. I can say that the bugs in my own code that genunits has found so far have been easy to fix. If I relied only on conventional testing, identifying where the bugs were would have taken longer. But does this benefit outweigh the problems?
Why have unit checking as part of the compiler?
Specifically, what can’t be done with a compile-time derived type implementation that a compiler implementation would allow?
By the way: All of the problems listed below aside from the last can be mitigated, at least to an extent, by compiler developers without adding a units feature.
Run-time performance takes a huge hit even avoiding run-time checks
Unfortunately, run-time performance is by far the biggest disadvantage. Even with optimizations, there’s a minimum order of magnitude increase in run-time for the SOR Poisson solver test I’m using. This is unacceptable for many applications, and surprising to me given that the unit checking is at compile-time. Something about writing units as derived types inherently causes a slow down. I’m not a compiler engineer, so I don’t know what’s going on under the hood. Some specific numbers from my tests:
gfortran -O2
: 11.2x slow downifort -O2
: 13.7x slow downifx -O2
: 19.1x slow downnvfortran -fast
: 35.4x slow down
Slow compilation and effort needed to mitigate slow compilation
Van Snyder has discussed the huge number of units needed to cover part of what I call a unit system. Having a huge number of units will lead to slow compilation for some compilers.
genunits has been designed to minimize the number of units to make compilation faster. Essentially, genunits requires some seed units that form a basis for the unit system, and a user-specified number of units are generated from that basis in rough order of likelihood of appearance. However, this is not a panacea. As development proceeds, the size of the unit system required tends to expand due to what I call intermediate units, leading to having to adjust the genunits configuration to generate more units, and slowing compilation time. Intermediate units are not used for the defined variables, but appear in expressions as mathematical operations are performed. For example, consider m
with units of kg, rho
with units of kg/m3, and x
, y
, and z
with units of m. The equation m = rho * x * y * z
will include the intermediate unit formed by rho * x
, which has units of kg/m2. This unit needs to be part of the unit system for the unit checking to function. If kg/m2 is not part of the unit system, the user will get a compiler error identical to that when there is a unit mismatch despite there being no actual error.
Compiler error messages when units mismatch are often unclear
A compiler implementation could have far more helpful and descriptive error messages. And as I said, many compiler error messages for genunits are false positives in a sense, in that there is no actual unit mismatch, but the unit system needs to be expanded to include required intermediate units.
Exponentiation operators are limited in a derived type implementation
For example, a derived type implementation can’t determine the units of x**2
, unless x
is unitless. The programmer will have to instead write x*x
or use a convenience function like square(x)
. However, a compiler implementation would not have this limitation.
Closing
My main goal here is to inform potential future Fortran language and compiler developments. I’m also hoping people will have some ideas about how to mitigate the performance issues, but I suspect little can be done on my end.
There’s a lot more I could write about this, but I want to prevent this post from being even longer. I’m happy to answer any questions about this.