Supporting BFLOAT16 in Fortran: “NOT recommended"?

Low-precision floating-point arithmetic is one of the magics that enable us to have the powerful AI today (see also discussions on Hacker News). It is the workhorse on modern hardware like GPU and TPU, although not as much on CPU (yet).

Being a language aimed at supporting HPC on modern hardware, Fortran cannot afford beig left behind in terms of low-precision floating-point arithmetic. Fortunately, REAL16 (half-precision real) has been introduced in F2023, which is fabulous! As of March 2024, I only know nagfor and nvfortran support it, the latter without providing intrinsics like abs, exp, …

However, the Low-precision floating-point arithmetic being used in AI training is mostly BFLOAT16. This non-IEEE arithmetic has a precision lower than but a range equal to single precision. No Fortran compiler on the market supports it now, as far as I know.

Indeed, I read today that there was a discussion about introducing BFLOAT16 to iso_fortran_env, but the conclusion was that

JOR recommends that J3 does not proceed with adding any explicit support for BFLOAT16.

See https://j3-fortran.org/doc/year/20/20-118.txt.

Will LFortran support BFLOAT16 @certik ? I am told that LLVM Flang supports it with real(kind = 3), but LLVM Flang is not ready to use as far as I understand. I look forward to trying in in PRIMA.



Referenes

2 Likes

It’s probably just random chance but I feel like I encounter FP16 much more often than BF16. For ML I’d like to see a breakdown of which major models use which and in which formats.

I’d rather see FP8 or other quantization support, if i had to prioritize.

2 Likes

For your reference, see

To Bfloat or not to Bfloat? That is the Question!

How to select half precision (BFLOAT16 vs FLOAT16) for your trained model?

Mixed Precision Training: Difference between BF16 and FP16

BFloat16: The secret to high performance on Cloud TPUs

The general view seems to be “use BFLOAT16 if it is supported on your platform”, to benefit from its high speed (hardware support needed) and low storage while maintaining the same range as single precision. But this really depends on your hardware and language. As mentioned, the availability of BFLOAT16 is still limited on CUP, and does not exist with Fortran (“supporting it in Fortran is not recommended” …).

You may also search for “machine learning float16 or bfloat16”. For traditional HPC, the situation may not necessarily be the same.

1 Like

Note that this has strictly no consequence on the actual support of bloats16’s by the compilers. It’s just that there’s no dedicated constant in iso_fortran_env. Conversely, if such constant was adopted, it would just be -1 in most compilers.

1 Like

Being neither a standard committee member nor a compiler developer, I guess the inclusion of BFLOAT16 in iso_fortran_env would encourage rather than discourage vendors to support it. In contrast, the recommendation for not including/supporting it in iso_fortran_env can hardly be an encouragement.

Excluding BFLOAT16 is almost excluding Fortran from training large machine learning and AI models, which is not necessarily the most important business but cannot be ignored either.

Kind of interesting that if you do a web search on half or mixed precision applications you see several articles pop up that describe research in using half precision for linear algebra (definitely something the HPC community cares about). In particular there appears to be a lot of interest in using iterative refinement techniques to recover standard (32 bit) precision from mixed half-precsion linear solvers etc. I guess this is driven mostly by ML training requirements but given potential performance difference of half-precision over standard precision on even modestly priced GPUs you would think this is something the standard committee would want to support. Since NVidia, AMD and I presume Intel hobble the double precision performance on their consumer graphics cards, some kind of mixed precision solution to HPC type problems on inexpensive hardware seems like something Fortran should be supporting.

1 Like

Low/mixed-precision computing has been an intriguing research topic in the numerical analysis community for many (~20) years. It is much before low-precision training shows its power in machine learning. See the review paper from 2009 for more information. Jack Dongarra (Turing Award winner) is one of the co-authors.

Another important figure in mixed-precision computing was Nick Higham (FRS, former SIAM president), who sadly passed away recently.

As a computational/applied mathematician, I have kept a keen interest in low/mixed-precision computing. Even though I have not made (published) any essential contribution yet, I am confident to say that it is one of the most important topics in modern scientific computing, even if it did not turn out that useful to machine learning and AI.

Unfortunately, as a language for scientific computing, Fortran is essentially out of this game (up to now) — it is indeed excluding itself from the game.

Again, nobody wants to -and more than that nobody can- prevent any compiler from supporting a bfloat16 real. I can’t imagine that the decision from a compiler editor to bring bfloat16 support depend in any way on the presence of a constant in iso_fortran_env (before iso_fortran_env did exist, many compilers were supporting 128bits reals).

nvfortran supports bfloat16 because it makes perfect sense for Nvidia: they need it to operate their GPUs.

nagfor supports bfloat16 because they are multi-platform

Intel Fortran doesn’t support bfloat16 because Intel does’t have any hardware with them (in the same way, Intel Fortran does not support OpenMP offloading to GPUs other than Intel… at least it was the case until recently)

gfortran, well, I guess they lack resources for implementing everything they would like.

Fair enough. However, if we only rely on vendor extensions for important features of the languages, what is the point of having a standard in the first place?

There’s a confusion here. Again, introducing a constant in iso_fortran_env wouldn’t mean that supporting the corresponding type would be required. Compilers are not required to support a real128 type: just, if they do, they can set the kind value in real128 (and if they don’t the constant is -1). In that sense, the support of a real128 type, if different from the default real and double precision ones, is an extension to the standard.

I know very well that the inclusion of bfloat16 in iso_fortran_env never means requesting/ensuring support for it. It is clear to me that Fortran standards impose only the existence of the default and the double-precision real, and will unlikely impose anything else in the foreseeable future.

However, IMHO, the inclusion or exclusion of a kind value in iso_fortran_env still means something. The interpretation of this “something” may vary from person to person, but it is definitely not nothing. Otherwise, the standard committee would not have had a vote to decide it.

1 Like

The problem is that the title of this topic, and a part of your text as well, can mean something different to the readers…

Knowing the arguments for/against the proposal inside the subgroup and committee would be interesting, in particular why real16 had been accepted and not bfloat16. A part of the answer is maybe that the kind constants in iso_fortran_env do not attempt to describe a specific implementation. For instance real32 can be any floating point stored in 32 bits, not necessarily the IEEE flavor. The same with real128, which can be in practice either the IEEE flavor, or the “double double” one.

1 Like

Maybe this was one of the reasons.

I think we need both IEEE f16, and bf16 in Fortran.

What are the kind numbers that other compilers use for f16 and bf16?

2 Likes

Nvidia web sites say that fp16 is real(2) but is only fully supported on gpu/tpu. However, the source file for iso_fortran_env in latest nvidia HPC-SDK compilers doesn’t list a KIND parameter with a value of 2 for a real value (ie no REAL16). I don’t know about BFLOAT16. I don’t see any mention of anything other than the IEEE FP16.

I see: f16 (IEEE) could be kind=2, and bf16 could be kind=3?

The following code works with nvfortran 24.3.

Program test16
  USE ISO_FORTRAN_ENV

  real :: a32
  real(2) :: a16

  a32 = 4.0
  a16 = 2.0_2

  print *,' 32 / 16 = ', a32/a16

  stop
  end program test16

On Linux mint system
nvfortran -o test16.x testfp16.f90
./test16.x
32 / 16 = 2.000000

I presume the systems that cling to using the number of bytes as the KIND value will probably default to 3 but the standard says it can be anything. Probably best to check with NAG to see what they do since they are the only ones I know of that don’t use REAL_KINDS=[4,8] etc.

Edit.

As to the KIND variable names for ISO_FORTRAN_ENV, I think just using REAL16 for IEEE f16
and REAL16E (extended real16) or REAL16B for BFLOAT16 might be the path of least resistance but that makes to much sense (at least to me) so the standards folks would never adopt it.

! kinds.f90
program kinds

use iso_fortran_env, only : REAL16, REAL32, REAL64, REAL128
implicit none

print *, REAL16, REAL32, REAL64, REAL128

end program kinds
$ nagfor kinds.f90 && ./a.out 
NAG Fortran Compiler Release 7.2(Shin-Urayasu) Build 7201
[NAG Fortran Compiler normal termination]
 16 1 2 3