No, Gfortran is very good. I think the problem is your 12th Gen Intel® Core™ i5-1235U.
Google search finds The i5–1235U is faster in tasks that use only 1–2 CPU cores or threads, such as photo editing. This is because the i5–1235U only has 2 high-performance CPU cores, its other 8 cores are much weaker “efficiency” cores.23 Dec 2023.
It is surprising there are only 2 performance cores, but 8 efficiency cores !
Then there is the problem with cooling with any recent Intel laptop.
Indeed. Not only the speed-up is reduced once Efficiency cores are used, but I think that OpenMP runtimes do not handle well (and maybe can’t handle well) the mix between performance and efficiency cores.
Moreover, this CPU has 10 physical cores (not 12) + 2 logical cores. When running with 12 threads the logical cores are used, but logical cores generally do not help speeding-up such computations.
Given all of this, the limited speed-up is not surprising.
Both are linked: more performance cores would result in more heating, and ultimately in automatic throttling (frequency downclocking) of the performance cores to reduce the temperature.
On CPUs with Efficiency Cores, it would be worth trying the schedule(nonmonotonic:dynamic) clause in !$omp do . The dynamic schedule can help when the workload is not balanced between the threads (and maybe when the threads do not run on equally performant cores). The problem is that this schedule has more overheads than the default static one, but with the recent nonmonotonic modifier the overheads get very limited. But I’m not sure that gfortran supports it right now (recent versions of ifx do).
@suki
Given you have reported your program performance is topping out at 2 threads, this appears to be consistent with the google thread of only 2 performance cores, not “Number of performance cores: 10”
I have not had any experience with using “efficiency” cores for OpenMP, but assuming their instruction excludes avx instructions, I would think if using threads with such an imbalanced performance would exclude most of the computation types I have.
You could possibly try adding !$OMP& SCHEDULE (DYNAMIC)
that is assuming Gfortran -fopenmp even tries to use the other cores.
I have asked questions of if Gfortran or OpenMP implimentations utilise “efficiency” cores but never received a clear answer. !$OMP PARALLEL DO can be more sensitive to threads of different performance, so I don’t intend to try.
I currently use a Ryzen 5900X where I have turned off “2-way simultaneous multithreading”, as I get no benefit from the extra threads.
On my Intel 8700K, I also do not use hyper-threading, although most of my calculations are with large arrays where memory bandwidth might be the problem.
For multi-threaded computation, a desktop with adequate cooling could be a much better option. The AMD (nonX) processors running at a slower clock rate look to be a much better solution than an Intel room heater ! I wonder where the glossy marketed processors are taking us.
The Intel approach makes sense in a general-purpose laptop (e.g. not targeted at HPC), where the cooling capabilities are limited by the size of the laptop. The performance cores can handle peak computations demands for a short period of time, and the efficiency cores handle all background processes that are generally not very demanding.
Your Ryzen 5900X has a 105W TDP, which is much too high for most of the laptops.
Apple has here an edge with its Apple Silicon chips, which have a much better efficiency (flops/W) then the current x86 chips. A Macbook Pro can get better performances than equivalent x86 laptops, and without heating that much.
I have now identified that this problem is probably due to call omp_set_num_threads ()
This is occurring on my windows 10 PC and was introduced in Gfortran Ver 11.3.0 and later versions.
It does not occur in Gfortran Ver 11.1.0 and earlier versions.
I am not sure if this is in Gfortran or the windows thread managment library for equation.com’s implementation of Gfortran.
This is a code that I have to demonstrate the problem.
My batch test is
call set_gcc 11.1.0
set program=test4
del %program%.exe
gfortran %program%.f90 -O3 -march=native -fopenmp -o %program%.exe
%program%
A simple reproducer below which exhibits the problem if “call omp_set_num_threads (4)” is included.
program test
! small reproducer version of OpenMP program hanging on Win 10 / Gfortran 11.3.0
! gfortran test.f90 -O3 -march=native -fopenmp -o test.exe
use iso_fortran_env
implicit none
integer, parameter :: num = 1000
real :: A(num)
real :: RA
integer :: i
write (*,*) 'Vern : ',compiler_version ()
write (*,*) 'Opts : ',compiler_options ()
call omp_set_num_threads (4) ! this causes problem for Gfortran 11.3.0 +
write ( *,*) 'Test n=',num
A = 1
ra = 0
!$OMP PARALLEL DO private (i) shared (A), REDUCTION (+: RA)
do i = 1, size(A)
RA = RA + A(i)**2
end do
!$OMP END PARALLEL DO
RA = sqrt (RA)
print*,RA,' OpenMP', sqrt(real(num))
! problem is demonstrated if peogram hangs here and does not exit
end program test
If others can test to identify if it occurs with other OS or other Gfortran implementations, I would be interested to know.