Defining multiple real precisions in a program

I am not involved directly, just following what’s going on in the field.

The reasons why they are looking at 16-bit reals are several:

  • the application is extremely memory intensive; for each cell of a 3D spatial grid, you need to store 19 or even 27 DDF’s (discrete distribution functions), plus any additional scalar fields (temperature, pressure, etc.)
  • given how memory intensive it is, the available GPU memory quickly becomes a restriction on the domain size (or grid size); this also puts a cap on the Reynolds number you can simulate (without resorting to turbulence models)
  • the algorithm is bandwith-limited; a large share of time is spent just shifting the DDF’s in memory (think particles moving along a grid)
  • It turns out with some tricks, and assuming a stable simulation, the DDF’s will fall between -2 and 2, mostly aggregating around 0. Parts of the mantissa and exponent are therefore completely unused.

As far I can understand, the application was written in OpenCL, for hardware that had support for 16-bit floats (cl_khr_fp16), and not in Fortran.

1 Like