Languages Benchmark Visualization

The following site is visualizing benchmark tests of the Benjamin Dicken’s Languages project:

Fortran performs well. However, I have questions about the “Naive Fibonacci” test. I don’t see why Fortran would be 50 times faster than C and any other language (in the “legacy run”). In general, Fortran has a similar speed to C and C++.

We just have a recursive function: languages/fibonacci/fortran/fibonacci.f90 at main · bddicken/languages · GitHub

Would the compiler make a special optimization for it? But the difference is so big with other languages that it seems bizarre. Have you any idea on what is going on?

1 Like
module benchmark
  implicit none
  private
  public :: run, format_results, benchmark_result_t  ! Make benchmark_result_t public

  type :: benchmark_result_t
    integer :: runs
    real(8) :: mean_ms
    real(8) :: std_dev_ms
    real(8) :: min_ms
    real(8) :: max_ms
    integer(8) :: result
  end type benchmark_result_t

contains

  subroutine run(f, run_ms, result)
    implicit none
    interface
      integer(8) function f()
      end function f
    end interface
    procedure(f), pointer :: func_ptr
    integer, intent(in) :: run_ms
    type(benchmark_result_t), intent(out) :: result
    integer(8) :: start_time, end_time, elapsed_time, total_elapsed_time
    integer(8) :: count_rate
    integer :: count
    real(8) :: elapsed_times(1000000), mean, variance, std_dev, min_time, max_time
    logical :: print_status
    integer(8) :: last_status_t

    ! Check for run_ms being zero
    if (run_ms == 0) then
      result%runs = 0
      result%mean_ms = 0.0
      result%std_dev_ms = 0.0
      result%min_ms = 0.0
      result%max_ms = 0.0
      result%result = 0
      return
    end if

    func_ptr => f
    total_elapsed_time = 0
    count = 0
    min_time = 1.0e12
    max_time = 0.0
    print_status = (run_ms > 1)
    call system_clock(count_rate=count_rate)  ! Get the count rate
    last_status_t = 0

    if (print_status) then
      write(0, '(A)', advance='no') "."
      flush(0)
    end if

    do while (total_elapsed_time < run_ms * 1.0e6)
      if (print_status .and. (total_elapsed_time - last_status_t) > 1.0e9) then
        last_status_t = total_elapsed_time
        write(0, '(A)', advance='no') "."
        flush(0)
      end if
      call system_clock(start_time, count_rate=count_rate)  ! Use nanosecond precision
      result%result = func_ptr()
      call system_clock(end_time, count_rate=count_rate)    ! Use nanosecond precision
      elapsed_time = end_time - start_time
      if (elapsed_time == 0) cycle  ! Skip zero elapsed time measurements
      if (count < size(elapsed_times)) then
        elapsed_times(count + 1) = elapsed_time / 1.0e6
      else
        write(0,*) "Error: Exceeded maximum number of iterations"
        exit
      end if
      total_elapsed_time = total_elapsed_time + elapsed_time
      count = count + 1
      if (elapsed_times(count) < min_time) min_time = elapsed_times(count)
      if (elapsed_times(count) > max_time) max_time = elapsed_times(count)
    end do

    if (print_status) then
      write(0, '(A)') ""
    end if

    mean = sum(elapsed_times(1:count)) / count
    variance = sum((elapsed_times(1:count) - mean)**2) / count
    std_dev = sqrt(variance)

    result%runs = count
    result%mean_ms = mean
    result%std_dev_ms = std_dev
    result%min_ms = min_time
    result%max_ms = max_time
  end subroutine run

  subroutine format_results(benchmark_result, result_str)
      implicit none
      type(benchmark_result_t), intent(in) :: benchmark_result
      character(len=:), allocatable, intent(out) :: result_str
      character(len=256) :: temp_str

      write(*, '(A)') 'mean_ms,std_dev_ms,min_ms,max_ms,runs,result'
      write(temp_str, '(f0.6,",",f0.6,",",f0.6,",",f0.6,",",i0,",",i0)') &
        benchmark_result%mean_ms, benchmark_result%std_dev_ms, benchmark_result%min_ms, benchmark_result%max_ms, benchmark_result%runs, benchmark_result%result

      result_str = trim(adjustl(temp_str))
  end subroutine format_results

end module benchmark

module fibonacci_module
    implicit none
    private  ! Everything is private by default...
    public :: fibonacci  ! Except what we explicitly expose

contains

    recursive function fib_internal(n) result(f)
        integer(8), intent(in) :: n
        integer(8) :: f

        if (n == 0) then
            f = 0
        elseif (n == 1) then
            f = 1
        else
            f = fib_internal(n - 1) + fib_internal(n - 2)
        end if
    end function fib_internal

    function fibonacci(n) result(f)
        integer(8), intent(in) :: n
        integer(8) :: f

        f = fib_internal(n)
    end function fibonacci

end module fibonacci_module

program main
    use benchmark
    use fibonacci_module
    implicit none
    integer(8) :: n
    integer(4) :: run_ms, warmup_ms
    character(len=256) :: arg
    type(benchmark_result_t) :: warmup_result, benchmark_result
    character(len = :), allocatable :: result_str

    call get_command_argument(1, arg)
    read(arg, *) run_ms                 ! Convert the command-line argument to integer
    call get_command_argument(2, arg)
    read(arg, *) warmup_ms              ! Convert the command-line argument to integer
    call get_command_argument(3, arg)
    read(arg, *) n                      ! Convert the command-line argument to integer

    call run(fibonacci_benchmark, warmup_ms, warmup_result)
    call run(fibonacci_benchmark, run_ms, benchmark_result)

    call format_results(benchmark_result, result_str)
    write(*, '(A)') trim(adjustl(result_str))

contains

    integer(8) function fibonacci_benchmark()
        implicit none
        integer(8) :: result

        result = fibonacci(n)
        fibonacci_benchmark = result
    end function fibonacci_benchmark

end program main

I compiled it with gfortran -O3 –free-line-length-none main.f90 and ran with: ./a.out 10000 2000 40 which I believe is how they obtained the benchmarks and got:

..
..........
mean_ms,std_dev_ms,min_ms,max_ms,runs,result
.859421,.134473,.705390,2.684598,11636,102334155

which I think is even better than what they have up on the website?

Running on a:

Architecture:           x86_64
  CPU op-mode(s):       32-bit, 64-bit
  Address sizes:        46 bits physical, 48 bits virtual
  Byte Order:           Little Endian
CPU(s):                 40
  On-line CPU(s) list:  0-39
Vendor ID:              GenuineIntel
  Model name:           Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
    CPU family:         6
    Model:              79
    Thread(s) per core: 2
    Core(s) per socket: 20
    Socket(s):          1
    Stepping:           1
    CPU max MHz:        3600.0000
    CPU min MHz:        1200.0000
    BogoMIPS:           4397.29
    Flags:              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr ss
                        e sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xt
                        opology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtp
                        r pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_
                        lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stib
                        p fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xs
                        aveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
Caches (sum of all):
  L1d:                  640 KiB (20 instances)
  L1i:                  640 KiB (20 instances)
  L2:                   5 MiB (20 instances)
  L3:                   50 MiB (1 instance)
NUMA:
  NUMA node(s):         1
  NUMA node0 CPU(s):    0-39

nvfortran segfaults with that code…

flang runs it fine:

flang version 21.1.3
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/bin
Build config: +assertions
..
.........
mean_ms,std_dev_ms,min_ms,max_ms,runs,result
621.264175,.061052,621.222717,621.468567,17,102334155

gfortran 14.2:

..
..........
mean_ms,std_dev_ms,min_ms,max_ms,runs,result
.973129,.001484,.972427,1.058354,10277,102334155

and ifx seems to hang?

1 Like

GFortran 15.2.0

$ gfortran -ffree-line-length-none -O3 main.f90
$ ./a.out 10000 2000 40
mean_ms,std_dev_ms,min_ms,max_ms,runs,result
.696978,.019519,.691482,1.107574,14348,102334155
$ gfortran -ffree-line-length-none -O2 main.f90
$ ./a.out 10000 2000 40
mean_ms,std_dev_ms,min_ms,max_ms,runs,result
.696559,.017689,.691498,1.049809,14357,102334155
$ gfortran -ffree-line-length-none -O1 main.f90
$ ./a.out 10000 2000 40
432.099777,.941244,430.768768,434.440277,24,102334155

With -O2 and -O3, I wonder if the program is really recomputing the fibonacci function (it is always called with the same n=40 argument, so the function always returns the same value). I have already encountered such problems making benchmarks with optimisation on.

I think that in the website the person says something along the lines that “if the compiler figured this out and is doing it this way, then that is the result”

Yes, we could simultaneously think that it is quite unfair but that the Fortran compiler is just really clever, refusing to make useless computations…

1 Like