Is Cuda Fortran good enough as the main Fortran compiler?

Dear all,

It seems if one need to run Fortran on GPU, as @themos pointed out, after some google, it leads to nvidia cuda Fortran.

I have not used it yet because I mainly use windows and nividia’s cuda Fortran seems has not yet ready for windows.

So I wonder, is there anyone use cuda Fortran on linux perhaps?
Is it fast/stable enough to be used as the main Fortran compiler?

In principle since a GPU have so many small cores, when deal with big matrix operations which is typical in neural network, it should be very fast.

If Fortran can take advantage of GPU, it could be great. For one thing, it can gave many people a solid reason to stay in or switch to Fortran.

Thank you very much in advance!

1 Like

Have you tried google?

Thank you. I apologize. I have changed the title and description to be more accurate and meaningful.
Yes. Most links will direct to nvidia’s cuda Fortran if not all, which makes me think that perhaps gfortran or intel Fortran in general does not support GPU.
But I can be wrong.
So wanted to see if anyone here can show a simple example how to run perhaps gfortran or intel Fortran on GPU.
If one have to use Nvidia’s cuda Fortran to use GPU, then true, there are some cuda Fortran examples online. Not sure how many people use cuda Fortran as the main Fortran compiler, and not sure if cuda Fortran is good enough. But, it is not a surprise that GPU vendor has their own Fortran compiler.

We definitely need a GPU section here:

Would anyone be willing to write it? That section should explain the current approaches how to write GPU code using Fortran, using the current compilers (gfortran, ifort, nvfortran, nag, etc.) and what the options are.


(This is my first post so first of all, hi everyone and thank you for this amazing initiative for the Fortran community)

Since this seems to be a more general question, please, let me share my personal experience with Fortran and GPUs. There are many ways you can port you Fortran code to GPUs. From higher to lower level (or more to less portable), you could use:

  • a library that is already ported to GPUs, e.g. cuBLAS, cuFFT, etc.
  • directive based approaches like OpenACC or OpenMP. These will let you “describe” the code you want to port to GPUs and then, the compiler does the work for you.
  • “low-level” APIs like CUDA (mainly targetting NVIDIA GPUs) or the AMD-equivalent ROCm. Here, you have a fine grain control of what you are doing on the GPU.

With the first option, you need a minimum to no understanding on how GPUs work while as you go down the options, you will need a more deep understanding in order to make it work efficiently. If you are interested, I can try to list a few pros and cons for each approach.

Now, let me add a few comments more directly related to your question. The CUDA Fortran compiler, nvfortran, is based on the PGI one (actually, I think they rebranded it) so I would say that yes, it is stable/fast enough.

In principle, you can use CUDA for Fortran. You can very easily code small examples that work great! However, as it is unfortunately often the case with Fortran, I found that there is not the same level of support as you could find in C/C++, for example. By default, nvfortran is not included in the CUDA toolkit and you need to download NVIDIA HPC SDK. If you use HPC clusters, you may find yourself with some problems. For example, the NVIDIA HPC SDK comes with its own openmpi which was not compatible with the configuration of the machine I was using. One easy workaround to all these drawbacks would be to use to Fortran/C interface to call CUDA C code (which I agree is a bit cumbersome).

To summarize a bit, I would first look for already existing libraries doing what you are trying to do. If they are not available and if you don’t specifically need CUDA, try using OpenACC or OpenMP. They yields very acceptable performance gains, are “easy” to implement in the sense that they are based on an incremental approach and allow you to have the same source code for CPU and GPU (the directives are considered as comment by the compiler if not instructed otherwise). Finally, if you want to really have control on what you do on GPU, or you are really trying to achieve the best performance, then CUDA (or ROCm (HIP actually)) is the best option, but I would strongly encourage you to first have a look at your dependencies and how well you can compile/combine them with CUDA.


Thank you so much, and welcome! Your comment is super! :+1:
After reading the information you gave, I may first try to run openMP on GPU, it seems a more universal standard/solution for modern Fortran.