Is Cuda Fortran good enough as the main Fortran compiler?

Dear all,

It seems if one need to run Fortran on GPU, as @themos pointed out, after some google, it leads to nvidia cuda Fortran.

I have not used it yet because I mainly use windows and nvidia’s cuda Fortran seems has not yet ready for windows.

So I wonder, is there anyone use cuda Fortran on linux perhaps?
Is it fast/stable enough to be used as the main Fortran compiler?

In principle since a GPU have so many small cores, when deal with big matrix operations which is typical in neural network, it should be very fast.

If Fortran can take advantage of GPU, it could be great. For one thing, it can gave many people a solid reason to stay in or switch to Fortran.

Thank you very much in advance!

2 Likes

Have you tried google?

Thank you. I apologize. I have changed the title and description to be more accurate and meaningful.
Yes. Most links will direct to nvidia’s cuda Fortran if not all, which makes me think that perhaps gfortran or intel Fortran in general does not support GPU.
But I can be wrong.
So wanted to see if anyone here can show a simple example how to run perhaps gfortran or intel Fortran on GPU.
If one have to use Nvidia’s cuda Fortran to use GPU, then true, there are some cuda Fortran examples online. Not sure how many people use cuda Fortran as the main Fortran compiler, and not sure if cuda Fortran is good enough. But, it is not a surprise that GPU vendor has their own Fortran compiler.

We definitely need a GPU section here:

Would anyone be willing to write it? That section should explain the current approaches how to write GPU code using Fortran, using the current compilers (gfortran, ifort, nvfortran, nag, etc.) and what the options are.

4 Likes
2 Likes

(This is my first post so first of all, hi everyone and thank you for this amazing initiative for the Fortran community)

Since this seems to be a more general question, please, let me share my personal experience with Fortran and GPUs. There are many ways you can port you Fortran code to GPUs. From higher to lower level (or more to less portable), you could use:

  • a library that is already ported to GPUs, e.g. cuBLAS, cuFFT, etc.
  • directive based approaches like OpenACC or OpenMP. These will let you “describe” the code you want to port to GPUs and then, the compiler does the work for you.
  • “low-level” APIs like CUDA (mainly targetting NVIDIA GPUs) or the AMD-equivalent ROCm. Here, you have a fine grain control of what you are doing on the GPU.

With the first option, you need a minimum to no understanding on how GPUs work while as you go down the options, you will need a more deep understanding in order to make it work efficiently. If you are interested, I can try to list a few pros and cons for each approach.

Now, let me add a few comments more directly related to your question. The CUDA Fortran compiler, nvfortran, is based on the PGI one (actually, I think they rebranded it) so I would say that yes, it is stable/fast enough.

In principle, you can use CUDA for Fortran. You can very easily code small examples that work great! However, as it is unfortunately often the case with Fortran, I found that there is not the same level of support as you could find in C/C++, for example. By default, nvfortran is not included in the CUDA toolkit and you need to download NVIDIA HPC SDK. If you use HPC clusters, you may find yourself with some problems. For example, the NVIDIA HPC SDK comes with its own openmpi which was not compatible with the configuration of the machine I was using. One easy workaround to all these drawbacks would be to use to Fortran/C interface to call CUDA C code (which I agree is a bit cumbersome).

To summarize a bit, I would first look for already existing libraries doing what you are trying to do. If they are not available and if you don’t specifically need CUDA, try using OpenACC or OpenMP. They yields very acceptable performance gains, are “easy” to implement in the sense that they are based on an incremental approach and allow you to have the same source code for CPU and GPU (the directives are considered as comment by the compiler if not instructed otherwise). Finally, if you want to really have control on what you do on GPU, or you are really trying to achieve the best performance, then CUDA (or ROCm (HIP actually)) is the best option, but I would strongly encourage you to first have a look at your dependencies and how well you can compile/combine them with CUDA.

12 Likes

Thank you so much, and welcome! Your comment is super! :+1:
After reading the information you gave, I may first try to run openMP on GPU, it seems a more universal standard/solution for modern Fortran.

Are there any tutorials on setting up Cuda Fortran on Fedora Linux?

Yes, the nvfortran compiler and CUDA Fortran are quite good. You are in practice limited to features of Fortran2003 and earlier, but I like CUDA Fortran much better than the CUDA C equivalents.

Also, it is worth looking at it your project can use the slower-but-easier OpenACC for GPU acceleration.

1 Like

I have often said (but perhaps not in a place you have read) that you should use as many compilers as you can. Reasons:

  1. One compiler may find bugs in your program that another misses.
  2. Even if two compilers find the same bug, one may give an error message that is easier to understand or more closely relates to your error.
  3. If two compilers both run your program but give different outputs it is always worth investigating. You may have hit one of the many places where the standard allows that, or there may be a bug in your program or the compiler. I have spent many unhappy hours checking whether bugs are my fault (usually) or the compiler’s (seldom). If you report a compiler bug, vendors prefer short programs that exhibit bugs. If you get an internal compiler error you should always report it: the compiler has a serious bug whether your program is correct or not.
  4. If your program runs, one compiler may make it run faster than another. It is not always the same compiler - it may depend on what the program was doing.
5 Likes

Have you had any experience trying to call cuda routines from fortran only using iso_c_env? as in:

function cudaSetDevice(ndevice) bind(C, “cudaMalloc”) ?

So far I haven’t been able to get it to work and it is driving me insane. I don’t want to have to use a specific compiler to get this functionality. Am I doing something potentially stupid?

2 Likes

That’s rather a Language feature, not a compiler one. Though I don’t think you can get code running on NVIDIA GPUs without using their tools.

1 Like

Gfortran supports partial offloading of code to both Nvidia and AMD GPUs, but I do not know the details.

1 Like

Here is a link with additional information from the GCC webpage.
https://gcc.gnu.org/wiki/Offloading

1 Like