79 Languages speed competition: Can we make Fortran win?

The reason for using a bitfield was to reproduce the algorithm presented in the first video of the comparison series as accurately as possible.

2 Likes

The reason why
in these solutions the results were computed at compile time so there was not much to do at run time…

3 Likes

It turns out you can do this in Fortran too! See the recent post by @mecej4.

3 Likes

Here my results for the 4 diffent Fortran codes of the comparison project

The original, non tweaked algorithm has been presented (for Python, C# and C++) in What's the FASTEST Computer Language? C++ vs Fortran vs Cobol: E04 - YouTube

The rules are explained in Primes/CONTRIBUTING.md at drag-race · PlummersSoftwareLLC/Primes · GitHub

My non-faithful (it does not use classes) approach has been presented in the Fortran vs. Cobol comparison video, although there exists also faithful implementation by Thomas Jollans (github user tjol).
I do not know, I guess because it has a rather “classic” Fortran look.

The considered compilers are:
gfortran-7
gfortran-8
gfortran-9
gfortran-10
gfortran-11
ifort 21.04
ifx 21.04 beta
aocc 3.1.0 based on flang 12.0

The test have been performed on an AMD Ryzen 7 3700X CPU (base frequency 2200 Mhz,
max frequency 3600 MHz)

The numbers are the number of times the sieve algorithm is completed
for the range up to 1000000 with 5 seconds

Notable results are:

  • When a bitfiled is used, there are strong variations between different compilers
    (and different versions of gfortran). The variations increase for the object-oriented version

  • The fastest code is generated if one 1-bit integer is used to store the state (prime/not prime)
    of a number (gain ~50% over the best bitfield results). There the results are more consistent than for the bitfield

  • Storing the states of the numbers in an array of logical variables is slower than storing it in an 8bit integer and similar to the performance of a good bitfield implementation.

  • Disclaimer: I do not believe that I really chose the optimal command line options

solution_1 One bit per digit (stored in int64 variables), not object oriented, shown in video

gfortran-7 -march=native -O3 -o benchmark-gcc-7 PrimesFortran.f08
8905.2

gfortran-8 -march=native -O3 -o benchmark-gcc-8 PrimesFortran.f08
11210

gfortran-9 -march=native -O3 -o benchmark-gcc-9 PrimesFortran.f08
11211

gfortran-10 -march=native -O3 -o benchmark-gcc-10 PrimesFortran.f08
11323

gfortran-11 -march=native -O3 -o benchmark-gcc-11 PrimesFortran.f08
12417

Note: ifort -v results in ifort version 2021.4.0 ifort dors not recognize the .f08 suffix changed to .f90
ifort -Ofast -march=core-avx2 -o benchmark-ifort-21.04 PrimesFortran.f90
6097

ifx -Ofast -march=core-avx2 -o benchmark-ifx-21.04-beta PrimesFortran.f9
7283

flang -march=native -O3 -o benchmark-flang-12.0 PrimesFortran.f90
6659

Object-oriented with bit array, fully meets the ‘faithful’ criteria of the comparison

gfortran-7 -march=native -O3 -oo-bitarray-gcc-7 prime-bitarray.f90
12552

gfortran-8 -march=native -O3 -oo-bitarray-gcc-8 prime-bitarray.f90
8199

gfortran-9 -march=native -O3 -oo-bitarray-gcc-9 prime-bitarray.f90
8138

gfortran-10 -march=native -O3 -oo-bitarray-gcc-10 prime-bitarray.f90
7613

gfortran-11 -march=native -O3 -oo-bitarray-gcc-11 prime-bitarray.f90
12619

ifort -march=core-avx2 -O3 -oo-bitarray-ifort-12.04 prime-bitarray.f90
2591.0

ifx -march=core-avx2 -O3 -oo-bitarray-ifx-12.04 prime-bitarray.f90
2547

flang -march=native -O3 -oo-bitarray-flang-12.0 prime-bitarray.f90
1412

Object-oriented, one 8-bit Integer per number

gfortran-7 -march=native -O3 -oo-8bit-gcc-7 prime-8bit.f90
18155

gfortran-8 -march=native -O3 -oo-8bit-gcc-8 prime-8bit.f90
16597

gfortran-9 -march=native -O3 -oo-8bit-gcc-9 prime-8bit.f90
17870

gfortran-10 -march=native -O3 -oo-8bit-gcc-10 prime-8bit.f90
18029

gfortran-11 -march=native -O3 -oo-8bit-gcc-11 prime-8bit.f90
18247

ifort -march=core-avx2 -O3 -oo-8bit-ifort-12.4 prime-8bit.f90
18086

ifx -march=core-avx2 -O3 -oo-8bit-ifort-12.4 prime-8bit.f90
18361

flang -march=native -O3 -oo-8bit-flang-12.0 prime-8bit.f90
9651
(Note: The values varied strongly between approx- 4000 and approx. 11000)

Object-oriented, one Logical per number

gfortran-7 -march=native -O3 -oo-logical-gcc-7 prime-logical-array.f90
12833

gfortran-8 -march=native -O3 -oo-logical-gcc-8 prime-logical-array.f90
12764

gfortran-9 -march=native -O3 -oo-logical-gcc-9 prime-logical-array.f90
12441

gfortran-10 -march=native -O3 -oo-logical-gcc-10 prime-logical-array.f90
12453

gfortran-11 -march=native -O3 -oo-logical-gcc-11 prime-logical-array.f90
12635.8

ifort -march=core-avx2 -O3 -oo-logical-ifort-12.4 prime-logical-array.f90
13174

ifx -march=core-avx2 -O3 -oo-logical-ifort-12.4 prime-logical-array.f90
13385

flang -march=native -O3 -oo-logical-flang-12.0 prime-logical-array.f90
5573
(Note: strong fluctuations of the results between ~3800 and ~8300)

2 Likes

I guess it is a typo and should be 8-bit integer.

Still, as you noticed a few post above, the best C++ solutions does most of the calculations during the compilation (constant expressions), so it is sort of cheating. If this is also the case for the other 2 or 3 leaders (Lisp, D, maybe Rust), that would explain the many-orders-of-magnitude difference between those and the rest of the competitors. It would require to sum compilation and execution times to make a fair comparison.

2 Likes

Yes almost all the leading contenders in this “drag race” from Zig to C++ achieve their performance via compile-time computing.

See this thread.

CONSTEXPR as much as possible has been a mantra with C++ and its derivatives such as D, Rust, Zig, etc. Folks such as Stroutsrup and De Rios et al. have presented a fair bit on the benefits of this, especially in systems programming. The results here are but one where their efforts show what is viable with the approach to compute once (during compilation) but to consume voraciously and efficiently i.e., use the values widely with great speed during runtime execution.

2 Likes

That was indeed a typo. I have corrected it.

1 Like

Well, I actually consider the tech demonstrations, so “cheating” has to be in airquotes.
The comparison as a whole is rather an educational game than a schieftific analysis.
The entertainment value is even more in focus in the videos where important information like compiler flags are not given.

1 Like

It actually shows that it is difficult for a human to beat a well optimizing compiler or runtime compiler. Of course, in theory assembler can do everything machine code can do and thus is unbeatable. So the human factor is the limit.

Edit: Corrected my previous awful spelling.

1 Like

I wrote an OpenMP version of the code some time ago, but there is confusion about what is a conforming solution. I am happy to provide this example if it is useful.

2 Likes

Here is a code inspired by APL that I am sure is slow, but the main program is concise.

module apl_mod
implicit none
interface operator(.i.)
   module procedure irange
end interface
!
interface operator(.d.)
   module procedure drop
end interface
!
interface operator(.o.)
   module procedure outer_product
end interface
!
interface operator (.ni.)
   module procedure not_in
end interface
!
contains
pure function irange(i) result(v)
! return integers from 1 to i
integer, intent(in) :: i
integer             :: v(i)
integer             :: j
do j=1,i
   v(j) = j
end do
end function irange
!
pure function drop(v) result(dv)
! return v(:) without the first element
integer, intent(in) :: v(:)
integer             :: dv(size(v)-1)
if (size(v) > 1) dv = v(2:)
end function drop
!
pure function outer_product(v) result(m)
! multiplication outer product
integer, intent(in) :: v(:)
integer             :: m(size(v),size(v))
integer             :: i,j
do i=1,size(v)
   do j=1,size(v)
      m(i,j) = v(i)*v(j)
   end do
end do
end function outer_product
!
pure function not_in(a,b) result(c)
! return in c the elements of a(:) not in b(:)
integer, intent(in)  :: a(:),b(:)
integer, allocatable :: c(:)
logical              :: amask(size(a))
integer              :: i
do i=1,size(a)
   amask(i) = .not. any(b == a(i)) 
end do
c = pack(a,amask)
end function not_in
end module apl_mod
!
program main
use apl_mod
implicit none
associate (t => .d. (.i. 1000) )
   print "(20(1x,i0))",t .ni. [.o. t]
end associate
end program main
output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997

3 Likes

@CRquantum ,

By “champion”, I presume you hope to find Fortran codes prove to be the fastest in a series of performance benchmarks.

Please note that is generally not possible at all now.

The best you can expect is on an apples-to-apples basis, Fortran codes will show somewhat similar performance - see here.

But the reality is other programming paradigms, particularly the C++ derivatives and Julia - will be able to bring to bear a host of other performance-enhancing aspects that will place them far ahead of Fortran, especially in microbenchmarks and also big application studies that call for large teams and massive $$ investments. Fortran simply doesn’t have the resources and the infrastructure now to bring in such aspects, often not even the will to do so (perhaps rightly so!).

What you notice here with prime number challenge is basically “CONSTEXPR” / compile-time computing with same outcome: at best, a Fortran solution might appear somewhere on the lower half of the leaderboard but nowhere near the top. You will be continuously disappointed if you hope to see Fortran in the top half.

2 Likes

To be honest, bit-wise manipulation of data is not really a typical application of Fortran. I have actually rather been surprised how similar the results between how similar the results of gfortran and (non-contexpr) g++ code were.
I did not even know about tve existence of the bit-manipulation intrinsics in Fortran before (I do not know my g++ results by heart.)

2 Likes

But the reality is other programming paradigms, particularly the C++ derivatives and Julia - will be able to bring to bear a host of other performance-enhancing aspects that will place them far ahead of Fortran…

And if Fortran be not risen, then is our preaching vain, and your faith is also vain.

1 Like

And posting the truth shall set one free from all vanity!

1 Like

I could not agree more. There was a university ranking almost a decade ago (that may still be around) that allowed custom weighting of ranking factors. We used to challenge ourselves to bring the worst-ranked graduate school in the US above the best schools just by varying the weights. This benchmark and other similar rankings of languages are no different. Like what Steve Kargl said in some other post, people always find what they are looking for in these benchmarks by playing with the factors involved and the problems to solve.

3 Likes

I don’t think this is the right conclusion to draw from this. If the answer here is that Fortran is a bad language for bit manipulation, that matters. Bit manipulation isn’t used everywhere, but it is a useful technique for extracting maximum performance from modern hardware. If that isn’t a domain Fortran considers relevant, Fortran isn’t a language that should be considered relevant for anyone who wants top tier performance.

2 Likes

That is what I am talking about. Being able to learn scuba diving in graduate school is great. Is that highly impactful on students’ future? probably not.

3 Likes

I agree with you. I think my statement from before is a bit based on personal prejudices. I have been programing Fortran for a few years in academia without doing any bit manipulation, but this really should not be generalized.

As I wrote in the previous post i have been actually surprised how capable Fortran has been with respect to the provided intrinsics and the performance.

1 Like

Thanks @Jweber for your reply.
Uhm, may I ask, what ‘bit-wise manipulation’ do you mean?

I have a small piece in my nuclear path-integral Monte Carlo code where we use some perhaps bit-wise manipulation like xor,ibits,shiftl to flip and manipulate the spin of particles.

E.g., two particles, can be spin up or spin down. So can use 0 as spin down, 1 as spin up.
In binary number the representation of the two particles can be 00, 01, 10, 11.
Some operators like spin exchange operator can exchange spin of the two particles, some operators like tensor operator can flip spins of the particles. So anyway, you know can use things like xor,ibits,shiftl, etc, on the binary representation to mimic those operators (exchange 0 and 1, or change 0 to 1, 1 to 0, etc). So if bit-wise manipulations includes things like xor,ibits,shiftl etc, and if in Fortran they are slow it will be really bad (particularly if in the code such bit-wise manipulations are heavy). :sweat_smile:

do nstate=1,nbasis   
        ...
        ! fteff,op3 
        spn=invspin(nstate) ! spn, 0:15.
        ispn=invispin(nstate) 
        signi=2*ibits(spn,i-1,1)-1
        signj=2*ibits(spn,j-1,1)-1    
        newspni=xor(spn,shiftl(1,i-1))
        newspnj=xor(spn,shiftl(1,j-1))
        newspn=xor(newspni,shiftl(1,j-1))
        signij=signi*signj
        c(1)=rsq(1)-rsq(2)*signij+(signi+signj)*r2cir1
        c(2)=signj*r3r1+signij*r3cir2
        c(3)=signi*r3r1+signij*r3cir2
        c(4)=signij*rsq(3) 
        cfteff(:)=fteff*c(:)
        cfteffiex(:)=fteffiex*c(:)   
        cwtnew(newspn,ispn)=cfteff(1)+cwtnew(newspn,ispn)    
        cwtnew(newspni,ispn)=cfteff(2)+cwtnew(newspni,ispn)
        cwtnew(newspnj,ispn)=cfteff(3)+cwtnew(newspnj,ispn)    
        cwtnew(spn,ispn)=cfteff(4)+cwtnew(spn,ispn)        
        ...     
enddo 
1 Like