GPU offloading in Fortran

Nice, looks like you beat me on time! what happens if you remove the --link-flag from the command line and leave --flag="-fopenacc" alone?

I’ve reduced it to just: FPM_FFLAGS="-fopenacc" FPM_LDFLAGS="-lcublas" fpm run and it works :slight_smile: @hkvzjal you can check out the repo I linked for a mini example of what you shared. You’ll need an nvidia GPU and the time to build a gcc from scratch…

1 Like

I might then create an env setup to make sure this does not happen again! thanks so much :slight_smile:

Ok so since fpm does not support additional flags in the manifest, you still need to add them when fpm running. But, if the executable is built already, you shouldn’t have to add the FPM_LDFLAGS. Note there is an open proposal to support additional flags in the manifest, it could be the right place to discuss/add about OpenACC support (I imagine it similar to the OpenMP one, that is currently a metapackage i.e. its support is added as a dependencies.openmp = "*", we could think of having the same for openACC)

3 Likes

Yes, I started with openacc because the example linked by @hkvzjal was in openacc. I can also add a openmp one and start the discussion

2 Likes

Nice @jorgeg, it is great that you made it into an fpm repo :smiley: I do have an nvidia gpu, I have not compiled gcc from scratch, I though there was an option to install the required additional dependencies, I’ll take a look at that.

This would be great

If you use the script from here after editing the paths it is literally just ./script.sh just go get coffee or something while you do this, it took a while :slight_smile:

2 Likes

It has double precision emulation - see here: https://www.sumseq.com/files/2024_OAS_Talk_RCaplan.pdf

As mentioned, the nvfortran compiler supports Fortran 2003, with a lot of more modern spec features supported (including some from 2023). The major missing feature is co-arrays.

As for GPUs, no one is “baking GPU support into Fortran”.
Instead, compilers are using the features of Fortran to offload to GPUs.
“do concurrent” has nothing to do with GPUs specifically, it simply states to the compiler that the loop can be executed out-of-order (no dependencies).
Since this usually also means it can be run in parallel, compilers have decided to allow the user to use DC for multi-threading parallelism on CPUs and offload to GPUs.
See here for details of this using NVIDIA, Intel and AMDl GPUs: https://www.sumseq.com/files/2024_OAS_Talk_RCaplan.pdf

– Ron

2 Likes

Interesting. I was under the impression that the nvfortran supported the language subset so old (like 2003) that there was even no point for me trying to compile my codes with it, but probably my knowledge was outdated. I will give it a try!

2 Likes

I think it’s actually the case… There will be a new version based on flang-new (therefore supporting F2018), but it’s not yet ready.

Talking about flang new, have you tried to build it and succeeded?

I can’t get it to build.

Nope (didn’t try)…

1 Like

It turns out there is. Tried the following on a Ubuntu 22.04:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt install gcc-12 g++-12 gfortran-12 gcc-12-offload-nvptx  

And managed to offload using OpenMP.

At first I wanted to do it with gfortran 13 but there seems to be a bug in the packaging, as I got this error when compiling a code:

x86_64-linux-gnu-accel-nvptx-none-gcc-13: fatal error: cannot read spec file ‘libgomp.spec’: No such file or directory   
compilation terminated.

Which a search on google just tells me it is a missing component and that the issue seems to not have been solved so far Bug #2036593 “Compilation error: absent libgomp.spec with gcc-13...” : Bugs : gcc-13 package : Ubuntu

@jorgeg I tried compiling from source using the script you shared but I got another error

cc1plus: fatal error: gengtype-lex.cc: No such file or directory
compilation terminated.

I’ll keep this gcc-12 version for the moment.

Ah super sad, for me it worked with no issues. I used gcc 11.4 to bootstrap I believe.

The script that I use to build flang-new from source is fresh-llvm-build.sh in the handy-dandy repository under my GitHub user name: rouson. With that script in my PATH, I execute something like git clone git@github.com:llvm/llvm-project && cd llvm-project && fresh-llvm-build.sh. I’ve used this script on macOS and Linux. My goal with the scripts in handy-dandy is just to capture steps that work for me on systems that I use frequently so that I don’t have to remember the steps every time. For other users, the scripts are probably better for reading than for running because I haven’t made any attempt to make the scripts work on any systems other than the ones that I frequently use. With that said, pull requests that make the script more portable or more useful are welcome.

1 Like

Also, the just-write-fortran talk that I previously cited here contains a URL for cloning an llvm-project fork and a git tag on that fork for a November 2024 version of flang-new that can parallelize do concurrent on CPUs and that I think (but haven’t verified) can offload some cases of do concurrent to GPUs. The reason that I haven’t verified the GPU offloading is that I know that it can’t yet offload the code of interest to me, but the capability to offload code like mine is under development.

1 Like

I’ll test this tomorrow morning! Thanks