Performance benchmark of various GPUs

septc · February 23, 2025, 2:17pm

I’ve just come across the following article which compares various GPUs extensively for different programs, which might be of interest for simulations.

It is interesting that the speedup from GTX980 to RTX5090 is about x10 for FluidX3D (maybe a CFD calculation), but only x4-5 for NAMD + MD calculation for 0.3 million (ATPase) and 1 million atom (mosaic virus) systems. I guess the latter may be more challenging for scalability because of the irregular nature of the particle data.

(BTW, I used GTX980 for MD simulations in around 2018, which was about $500 at that time. But the recent models of GPU seem very expensive, though the performance is also great… )

Beliavsky · February 23, 2025, 3:05pm

Here is a recent X/Twitter post by @sumseq

Timing results for a small run of the HipFT code on various CPU and GPUs. The code uses standard Fortran “do concurrent” to run in parallel on CPUs and offload to GPUs. For NVIDIA GPUs, unified memory is used, while for Intel GPUs, the use of Target directives are needed for data movement only. A neat result here is the use of the new $250 Intel Arc B580 GPU with its FP64 cores, performing right near where it should based on its memory bandwidth (HipFT is memory bandwidth bound).

jorgeg · February 24, 2025, 10:35am

something interesting to note for these benchmarks is what is the performance limitation for the algorithm, i.e. is the algorithm memory or compute bound?

If your algorithm is compute bound, you’ll see a super duper increase in performance as you move across GPUs. Basically, take the example of a DGEMM. I can bet my monthly salary that if I get a 1080, 2080, 3080, 4080, 5080, v100, a100, h100 and I do a dgemm I am going to see a beautiful trend of performance nearly doubling every architecture.

If the code is memory bound my improvement will be dependent on the innovations done to memory speeds, throughputs, caches, etc. I think FluidX3D being a CFD code there will be a memory limitation, so that benchmark is very effective for memory!

For MD that’s interesting, it is a weird algorithm overall, depending on how you calculate the forces you could be compute bound. So it does not really surprise me that the speedup is not as big. Whereas the other app is benefiting a lot from memory improvements naturally.

JeffH · March 2, 2025, 6:57am

If you like GPU benchmarks and Fortran, you might like https://research-information.bris.ac.uk/en/publications/benchmarking-fortran-do-concurrent-on-cpus-and-gpus-using-babelst

sumseq · March 10, 2025, 10:10pm

The code is highly memory bandwidth bound.
However since the code uses “do concurrent” it is hard to make custom caching as it is mostly in the compiler’s hands.

Topic		Replies	Views
How fast can GPU speedup a Fortran CPU code?	9	1940	November 12, 2021
AMDs Fortran efforts for GPUs	4	422	November 16, 2024
An evaluation of risks associated with relying on Fortran for mission critical codes for the next 15 years	82	7480	July 2, 2023
Does the speed (bandwidth) of memory matter?	11	1633	April 11, 2022
Fortran projects running on GPUs in production Help	16	2559	June 7, 2023

Performance benchmark of various GPUs

Related topics