Benchmark: The birthday problem

The birthday problem

Introduction to the problem to be solved

There is a planet whose years have 100 days. In this planet there are 5 people, so,
calculates in extent all possible cases. What is the probability that two people
were born on the same day?

The data structure

This is the vector v. Its size is equal to the number of people in the planet.
Each item in the vector represents the day each person was born. So

    -------------------------------
v:  |  1  |  4  |  1  |  7  |  6  |
    -------------------------------

Person 1 was born the first day of the year, person 2 was born de fourth day, the third
person the first day of the year (same day as the first person) …

The numbers in the vector are in the range 1..n_days

The process

It consists in testing all combinations in the vector and test if there are two or more
people who were born the same day.

Results

Execution time using $ time from command line. Fortran code was compiled using gfortran.

  • GNU Fortran recursive 39.48s
  • GNU Fortran iterative 54.69s
  • Intel Fortran recursive 2m 14s
  • Intel Fortran iterative 2m 54s
  • Julia recursive 4m 55.69s
  • Julia iterative 7m 25.34s

Source code

I assume that you are aware that this is a classic problem in probabilities that doesn’t require more than five calculations to solve. https://en.wikipedia.org/wiki/Birthday_problem

1 Like

Yes, it is a well known problem, but I just wanted to test the code working with many loops and checks. The results are just to compare how fast are languages, not to solve the problem.

1 Like

@Samuel_GF thanks for this submission!

The classic problem only gives you probabilities, while it is my understanding that the program above gives the actual numbers of coincidences for a given set of birthdays. So I think this is a great candidate for our benchmark repository, I created an issue for this: https://github.com/fortran-lang/benchmarks/issues/8.

@kargl is right that you should specify the exact compiler versions, flags used and the hardware details. In our benchmarks repository we will eventually do this automatically.

1 Like

I haven’t read the code carefully, but tried running the codes on my very old Mac
(2012) <-- because this is too old, I cannot install BigSur (the next macOS)… XD

  1. Because the Julia version seems to be using an array of Float64 for v, while Fortran uses an array of integer(8) for v, I first changed the former to an array of Int64, and also total / 100 to div(total, 100) (because / is a floating-point division similar to Python3…). Then the loop indices i and j become Int64 (rather than Float64), and so the code becomes closer to Fortran. This change gives 10 % speed up on my mac, but the gap in timing still remains… (<-- gfortran-10 -O2 is about 130 sec, gfortran-10 -O3 43 sec, while julia-1.4 -O3 is about 300 sec.)

(Btw, I cannot understand why the difference of O2 and O3 for gfortran is so large…)

  1. The Fortran version is using a fixed-size array for v on the stack (with length = 5 specified as a parameter), while Julia is using a dynamic array allocated on the heap (with length = 5). To fill this gap, using StaticArrays etc might be useful, but not very sure atm…
1 Like

Thank you for your reply. I am going to I improve the benchmark with all your suggestions and ideas today.

1 Like

Next iteration

New results

- GNU Fortran 0m 42s
- Julia       1m 32s     !! Faster than before

Changes in Julia code

After receive help and tips from Julia discourse forum these changes have been implemented:

  • Renamed function add1_recursive() to next_iteration()
  • Vector v now is a static array
  • No bounds checked for loops of same_day() function
  • No bounds checked for loops of next_iteration() function
  • Added a goto statement to take a shortcut
  • Added return nothing at the end of next_iteration()
  • Removed old progress system and added new @showprogress
  • Removed parameter n_people in function same_day() to avoid mistakes

Here you have the new Julia code.

Changes in Fortran code

  • The subroutine add1_recursive() has been renamed as next_iteration()

This is the Fortran code.

Information about my system

  • Julia 1.4.1
  • GNU Fortran 9.3.0
  • CPU Model: 6.78.3 “Intel® Core™ i3-6006U CPU @ 2.00GHz”
  • Memory Size: 3 GB + 768 MB
  • Ubuntu 20.04.1 LTS (Focal Fossa)
  • Linux kernel 5.4.0-42-generic
4 Likes

Awesome, this is a great example that shows why I prefer Fortran: just straightforward code using arrays, bounds checked in Debug mode and unchecked/fast in Release mode. Compared to Julia you have to worry about stuff like @inbounds (from the documentation my understanding is that it is not actually checked anymore in Debug build, since there doesn’t seem to be a way to trigger a Debug build that checks this).

4 Likes

A battle is taking place in the Julia discourse https://discourse.julialang.org/t/the-birthday-paradox-help-to-be-faster-than-fortran/44809/10

There are bright minds working in awesome solutions to get better results with Julia, but Fortran is still faster (20 seconds Julia vs 16 seconds Fortran).

3 Likes

@Samuel_GF, I presume you are @profesor_s at the Julialang discourse site? And that you may be conducting something like a “round-robin” effort and working up the “faithful” on both sides!!?

First, since your use of $ time command option on Linux is not a portable approach to measure performance (e.g., there is no reasonable equivalent on Windows even if Windows Powershell has some similar functionality), it is unclear what, if any, general conclusions can be drawn from the effort.

More importantly though, it’s unclear what is being attempted as measurement? Juila JIT compiler with LLVM dependency vs GCC backend? and how these two setup their IO instructions? It doesn’t quite look like Julia compared to Fortran, particularly with the latter having multiple compiler implementations that are faster in certain instructions than gfortran.

Increasingly though, note language performance comparisons, when two language offer similar semantics and syntax toward something (say array containers for floating-point objects), may be meaningless. But now, if a language has consciously decided not to include certain safeguards with facilities (e.g., use of pointer architecture to mimic multidimensional arrays in C++), it may make sense to compare what the “workaround” costs relative to a native implementation like arrays in Fortran. Or, alternately since Fortran has postponed any native type system like bit strings to manage arbitrary sequences of bits, it may make sense to how the cost of workaround options in Fortran versus native support of bit arrays in some other language in, say, an image rasterization application example.

What often matters are the aspects that cannot be measured readily: how easy or productive can the effort be in terms of the most precious commodity i.e., developer time, how full-featured yet advancing is the ecosystem including libraries and tooling to support the developers, how maintainable and extensible can the effort be in terms of future generations, future application needs and future computing architectures. Fortran does well in some areas, not so in others - that is what matters.

4 Likes