Fortran Programmers : How do you want to offload to GPU accelerators in the next 5 years?

I’d like to add my own Fortran OpenCL abstraction library Focal to the list of available options, see here for the slides I presented at FortranCon. Unlike fortrancl and clfortran, Focal presents a ‘Fortranic’ interface that abstracts away a lot of the low-level C library.

My personal preference is strongly towards Kernel-based approaches since they are more explicit and give more control over memory management and synchronisation. By contrast, in Directives-based approaches you usually need to infer what is going on ‘behind the scenes’ to understand performance implications and then add additional directives to constrain the compiler, resulting in messy code.

By example, adding directives to completely specify variable locality for a parallel loop essentially amounts to writing a kernel interface, albeit very verbosely - so you may as well use a kernel based approach with more control.
Ultimately, I think GPU directives attempt to perform a code abstraction that isn’t actually useful since it is better to retain control over the specifics of execution on GPUs.

Some disadvantages of existing Fortran options (IMO):

  • CUDA-Fortran is propriety, non-portable and results hardware vendor lock-in
  • OpenACC & OpenMP: degree of implementation varies, not mature and also requires extra work to enable compiler support
  • HIP / OpenCL: kernels (currently) need to be written in another language

The Fortran language already has a number of abstractions, particularly for arrays, that would be immensely useful when writing accelerator kernels; my preference would be for a language keyword like kernel that would optionally allow a subroutine to be compiled to any GPU backend. Unlike elemental, this would support various hierarchical and fine-grain parallelism features such as execution blocks and thread synchronisation. See initial discussion here.

8 Likes