There are two performance issues being demonstrated in this thread that I have also struggled with:-
- reduced performance of Gfortran Ver 12 in comparison to Ver 10 ( and Ver 8 & 9 ), which may be recovered in Ver 13
- Stalling of scale-up after 8 threads (very noticeable on my 5900X) and after 8 and 16 threads on 7950X.
For “1) reduced performance” Gfortran Ver 12, I have not itentified which OS you are using. It may well be that you have to check the new AMD support options for Gfortran and may need different compile options for AMD 7950X. My use of Gfortran and -fopenmp since Ver 8 has shown reduced performance, but has been mitigated by updating the compiler hardware and optimisation options for each version. Does AMD 7950X support AVX512 and does Gfortran Ver 12 handle this well ?
For “2) stalling” I have tried lots of options, as this has been a constant problem in my use of direct linear solvers in structural finite element calculations.
Is it memory bandwidth limitations ?
Is it hyper threading ?
Is it Win OS changes to support intel efficiency threads ?
For my (large vector), direct linear solver) calculations, all these appear possible.
Memory
Both 5900X and 7950X are only dual memory, which does appear to stall after a few threads, depending on the calculation type. Memory bandwidth has not kept pace with core count !
32 threads and dual memory access does not suit my type of calculation. The problem is more cores sell and marketing is a big influence of processor development.
OS support for E-cores
I have been using Windows Ver 8 and Ver 10 (not yet 11) and there are well documented problems with early Ver 11. My experience of Win Ver 10 updates has also shown similar problems for my calculation, as the OS thread to core (re)allocation algorithm has reduced my scale-up performance.
All this tracks back to Intel’s introduction of different core types; P-cores and E-cores, their support by the OS has adversely affected AMD processors.
Hyper-threading
There is a similar differing thread performance issue can occur with Intel’s hyper-threading or AMD’s simultaneous multithreading. Some OpenMP calculations benefit from synchronised threads, especially for cache sharing. To overcome this, I have turned simultaneous multithreading off on AMD. The graph shows that this is easily justified. I don’t get any better performance with 12 threads vs 24 on 5900X and this post for 7950X shows no better performance with 16 threads vs 32.
As yet, I don’t think the OS and Gfortran are doing a good job with hyper-threading on AMD.
My intel 8700K has the same problem, so this is not just an AMD problem.
The other black art that can be affecting all these problems is efficient cache usage, especially L3 between threads.
My problem with this and other possible explainations is mine has been an empirical analysis. I really don’t know what is the main cause of these problems.
I get very similar performance problems as presented in this thread on my AMD 5900X. I need to test a processor with more memory channels, but I doubt if this would be the majic solution. I need a bigger budget !
@solej , You could read the Gfortran Ver 12 and Ver 13 release notes and see if there are more appropriate compile options for Ryzen 9 7950X and please post your results if you find any recommendations.
My present Gfortran compile options include :
set basic=-c -fimplicit-none -fallow-argument-mismatch -march=native
set vector=-O3 -ffast-math -funroll-loops --param max-unroll-times=2
set omp=-fopenmp -fstack-arrays
I really don’t know if -ffast-math does much when memory bandwidth limitations become significant.