Performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia

Beliavsky · August 27, 2021, 1:34am

On the performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia

by Nischay Ram Mamidi, Kumar Prasun, Dhruv Saxena, Anil Nemili, Bharatkumar Sharma, S.M. Deshpande
arXiv, 16 Aug 2021

Abstract:
This report presents a comprehensive analysis of the performance of GPU accelerated meshfree CFD solvers for two-dimensional compressible flows in Fortran, C++, Python, and Julia. The programming model CUDA is used to develop the GPU codes. The meshfree solver is based on the least squares kinetic upwind method with entropy variables (q-LSKUM). To assess the computational efficiency of the GPU solvers and to compare their relative performance, benchmark calculations are performed on seven levels of point distribution. To analyse the difference in their run-times, the computationally intensive kernel is profiled. Various performance metrics are investigated from the profiled data to determine the cause of observed variation in run-times. To address some of the performance related issues, various optimisation strategies are employed. The optimised GPU codes are compared with the naive codes, and conclusions are drawn from their performance.

From the conclusion:

Post optimisation, the Fortran code was more efficient than Python and Julia codes. However,
the C++ code is still the most efficient as the SASS code generated by its compiler is optimal
compared to other codes. The optimised Python code was computationally more expensive as
the SASS code generated by the Numba compiler was not efficient.

Language and compiler versions and options used:

Fortran 90 nvfortran 21.2 -O3
C++ 20 nvcc 21.2 -O3 -mcmodel=large
Python 3.9.1 Numba 0.55.0 -O3
Julia 1.5.3 CUDA.jl 2.4.1 -O3 –check-bounds=no

The C++ code of the group is here.

oscardssmith · August 27, 2021, 3:09am

Do you know where the Julia code is? I’d be interested if there are any easy more easy speedups.

Beliavsky · August 27, 2021, 3:28am

No – I just emailed the submitting author on arXiv regarding the code of all the languages studied and will forward any reply.

oscardssmith · August 27, 2021, 3:39am

Thanks!

lmiq · August 27, 2021, 2:36pm

Looking at the paper, the differences are not enormous, thus probably there are no low hanging fruits. It likely an interesting paper for the CUDA.jl developers to look at.

Beliavsky · August 30, 2021, 5:23pm

His reply:

Here are the GitHub repo links for each of the four languages.

C++: https://github.com/Pathlessbark8/Cpp_serial/tree/cuda-gtc-split

Fortran90: GitHub - Nischay-Pro/mfcfd at rewrite-second-order

Python: GitHub - Nischay-Pro/meshfree-solver at cuda-hdf5-post-gtc-optimization

Julia: GitHub - Nischay-Pro/meshfree-solver at julia_cuda

With kind regards,

Nischay.

Topic		Replies	Views
Research articles using Fortran	178	17016	March 31, 2025
Julia regularly outperforms C/C++/Fortran	8	2159	March 29, 2022
Global Ocean Modeling With GPU Acceleration in Python	40	2129	January 11, 2022
Julia: Fast as Fortran, Beautiful as Python	184	11790	November 13, 2022
PASC24: "What the FORTRAN? Lost in Formula Translation"	1	404	June 11, 2024

Performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia

On the performance of GPU accelerated q-LSKUM based meshfree solvers in Fortran, C++, Python, and Julia

Related topics