That was indeed a typo. I have corrected it.
Well, I actually consider the tech demonstrations, so “cheating” has to be in airquotes.
The comparison as a whole is rather an educational game than a schieftific analysis.
The entertainment value is even more in focus in the videos where important information like compiler flags are not given.
It actually shows that it is difficult for a human to beat a well optimizing compiler or runtime compiler. Of course, in theory assembler can do everything machine code can do and thus is unbeatable. So the human factor is the limit.
Edit: Corrected my previous awful spelling.
I wrote an OpenMP version of the code some time ago, but there is confusion about what is a conforming solution. I am happy to provide this example if it is useful.
Here is a code inspired by APL that I am sure is slow, but the main program is concise.
module apl_mod implicit none interface operator(.i.) module procedure irange end interface ! interface operator(.d.) module procedure drop end interface ! interface operator(.o.) module procedure outer_product end interface ! interface operator (.ni.) module procedure not_in end interface ! contains pure function irange(i) result(v) ! return integers from 1 to i integer, intent(in) :: i integer :: v(i) integer :: j do j=1,i v(j) = j end do end function irange ! pure function drop(v) result(dv) ! return v(:) without the first element integer, intent(in) :: v(:) integer :: dv(size(v)-1) if (size(v) > 1) dv = v(2:) end function drop ! pure function outer_product(v) result(m) ! multiplication outer product integer, intent(in) :: v(:) integer :: m(size(v),size(v)) integer :: i,j do i=1,size(v) do j=1,size(v) m(i,j) = v(i)*v(j) end do end do end function outer_product ! pure function not_in(a,b) result(c) ! return in c the elements of a(:) not in b(:) integer, intent(in) :: a(:),b(:) integer, allocatable :: c(:) logical :: amask(size(a)) integer :: i do i=1,size(a) amask(i) = .not. any(b == a(i)) end do c = pack(a,amask) end function not_in end module apl_mod ! program main use apl_mod implicit none associate (t => .d. (.i. 1000) ) print "(20(1x,i0))",t .ni. [.o. t] end associate end program main
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997
By “champion”, I presume you hope to find Fortran codes prove to be the fastest in a series of performance benchmarks.
Please note that is generally not possible at all now.
The best you can expect is on an apples-to-apples basis, Fortran codes will show somewhat similar performance - see here.
But the reality is other programming paradigms, particularly the C++ derivatives and Julia - will be able to bring to bear a host of other performance-enhancing aspects that will place them far ahead of Fortran, especially in microbenchmarks and also big application studies that call for large teams and massive $$ investments. Fortran simply doesn’t have the resources and the infrastructure now to bring in such aspects, often not even the will to do so (perhaps rightly so!).
What you notice here with prime number challenge is basically “CONSTEXPR” / compile-time computing with same outcome: at best, a Fortran solution might appear somewhere on the lower half of the leaderboard but nowhere near the top. You will be continuously disappointed if you hope to see Fortran in the top half.
To be honest, bit-wise manipulation of data is not really a typical application of Fortran. I have actually rather been surprised how similar the results between how similar the results of gfortran and (non-contexpr) g++ code were.
I did not even know about tve existence of the bit-manipulation intrinsics in Fortran before (I do not know my g++ results by heart.)
But the reality is other programming paradigms, particularly the C++ derivatives and Julia - will be able to bring to bear a host of other performance-enhancing aspects that will place them far ahead of Fortran…
And if Fortran be not risen, then is our preaching vain, and your faith is also vain.
And posting the truth shall set one free from all vanity!
I could not agree more. There was a university ranking almost a decade ago (that may still be around) that allowed custom weighting of ranking factors. We used to challenge ourselves to bring the worst-ranked graduate school in the US above the best schools just by varying the weights. This benchmark and other similar rankings of languages are no different. Like what Steve Kargl said in some other post, people always find what they are looking for in these benchmarks by playing with the factors involved and the problems to solve.
I don’t think this is the right conclusion to draw from this. If the answer here is that Fortran is a bad language for bit manipulation, that matters. Bit manipulation isn’t used everywhere, but it is a useful technique for extracting maximum performance from modern hardware. If that isn’t a domain Fortran considers relevant, Fortran isn’t a language that should be considered relevant for anyone who wants top tier performance.
That is what I am talking about. Being able to learn scuba diving in graduate school is great. Is that highly impactful on students’ future? probably not.
I agree with you. I think my statement from before is a bit based on personal prejudices. I have been programing Fortran for a few years in academia without doing any bit manipulation, but this really should not be generalized.
As I wrote in the previous post i have been actually surprised how capable Fortran has been with respect to the provided intrinsics and the performance.
Thanks @Jweber for your reply.
Uhm, may I ask, what ‘bit-wise manipulation’ do you mean?
I have a small piece in my nuclear path-integral Monte Carlo code where we use some perhaps bit-wise manipulation like
shiftl to flip and manipulate the spin of particles.
E.g., two particles, can be spin up or spin down. So can use 0 as spin down, 1 as spin up.
In binary number the representation of the two particles can be 00, 01, 10, 11.
Some operators like spin exchange operator can exchange spin of the two particles, some operators like tensor operator can flip spins of the particles. So anyway, you know can use things like
shiftl, etc, on the binary representation to mimic those operators (exchange 0 and 1, or change 0 to 1, 1 to 0, etc). So if bit-wise manipulations includes things like
shiftl etc, and if in Fortran they are slow it will be really bad (particularly if in the code such bit-wise manipulations are heavy).
do nstate=1,nbasis ... ! fteff,op3 spn=invspin(nstate) ! spn, 0:15. ispn=invispin(nstate) signi=2*ibits(spn,i-1,1)-1 signj=2*ibits(spn,j-1,1)-1 newspni=xor(spn,shiftl(1,i-1)) newspnj=xor(spn,shiftl(1,j-1)) newspn=xor(newspni,shiftl(1,j-1)) signij=signi*signj c(1)=rsq(1)-rsq(2)*signij+(signi+signj)*r2cir1 c(2)=signj*r3r1+signij*r3cir2 c(3)=signi*r3r1+signij*r3cir2 c(4)=signij*rsq(3) cfteff(:)=fteff*c(:) cfteffiex(:)=fteffiex*c(:) cwtnew(newspn,ispn)=cfteff(1)+cwtnew(newspn,ispn) cwtnew(newspni,ispn)=cfteff(2)+cwtnew(newspni,ispn) cwtnew(newspnj,ispn)=cfteff(3)+cwtnew(newspnj,ispn) cwtnew(spn,ispn)=cfteff(4)+cwtnew(spn,ispn) ... enddo
A short list of places bit manipulation is useful:
- elementary functions (exp, log, trig, etc)
- special functions (bessel etc)
- data manipulation (filtering out missing values)
- reading/writing compressed files
- number theory
- cryptography (popcnt was invented by the NSA for a reason)
- chess engines (bitmasks are great)
They’re definitely not necessary for all users to use, but when you want maximum performance, they’re often where you end up.
Sorry @FortranFan if I may ask a stupid question,
uhm, simply speaking, why Fortran cannot be on the top half?
Or, why C++ and Julia can outperform Fortran in such benchmarks?
Or, what prevents Fortran from getting similar benchmark in this competition?
I don’t think we disagree on the use of bit manipulation. It is a question of how frequently a high-level programmer needs it. The first two (and perhaps even more) are not even on the list of concerns of most Fortran programmers (and perhaps HPC programmers in general). Bessel, exp, log, … are all Fortran intrinsics and the compiler developer’s task.
Thank you for the insight in the application of bit-manipulation in the context of physical simulations I did not know before.
I really did not think the performance of Fortran in the comparison was bad (I do not think the compile-time solutions should be used as a comparison as they only fork for values ore arrays that are in principle fixed from the beginning. There will be no change in the set of primes below 1e+6 ).
You can have fast code using Fortran, but you can also have fast code using other languages.
Thanks @Jweber .
Well, actually I won’t be bothered too much if some language is indeed faster than Fortran, LOL.
Bit operation seems a basic operation which is very close to hardware level. I hope Fortran can be very good at it. If C or Julia is indeed faster than Fortran in some operations like bit operations, I will be curious to know what is the mechanism behind that
I mean, e.g., if some language outperforms Fortran in applications requires heavy I/O, I will not be too surprised since I/O is not what Fortran is emphasized too much at. But bit operations seems fundamental enough and if Fortran does not perform at top level I will be a little bit pissed, LOL.
Perhaps for this 79 language test, to prevent compile-time cheating, a way to do can be the following,
- Using a huge positive integer to make sure the
compile-time computingbecome useless.
- Within that number, find as many as prime number per unit time as possible.
- time limit is 10 seconds.
Anyway, the point is just want to make sure the
compile-time computing become useless or too time-consuming to do (so no compiler will actually do
compile-time computing ), then all the computations are basically performed in real-time.
However, in real code, it is really a good idea to do cheating as much as possible.
Besides compiling-time computing, another example can be tabulating some time consuming functions then interpolate it.
E.g., f(x) is very time consuming to calculate, and we cannot afford to calculate it in real-time (perhaps it is in a very deep and big loop). But we know the range of x, say 0 to 100, and we know f(x) is smooth. So we can use 10^4 points to tabulate f(x) from 0 to 100. Then in the loop f(x) is really calculate by Lagrange 4-point interpolation (based on the tabulated points) which is very cheap.