Black-Scholes option pricing benchmark

The Black-Scholes option pricing formula in Fortran is

pure elemental function call_price(s,k,r,t,vol) result(price)
! Black-Scholes price of a European call option
real(kind=dp), intent(in) :: s     ! stock price
real(kind=dp), intent(in) :: k     ! strike price
real(kind=dp), intent(in) :: r     ! annual interest rate -- 0.02 means 2%
real(kind=dp), intent(in) :: t     ! time to expiration in years
real(kind=dp), intent(in) :: vol   ! annualized volatility -- 0.30 means 30%
real(kind=dp)             :: price ! call price
real(kind=dp)             :: d1,d2,vol_sqrt_t
vol_sqrt_t = vol*sqrt(t)
d1 = (log(s/k) + (r + 0.5_dp*vol**2)*t)/vol_sqrt_t
d2 = d1 - vol_sqrt_t
price = s*cumnorm(d1) - k*exp(-r*t)*cumnorm(d2) 
end function call_price

Calling the function 10^8 times took 10.2 and 3.4 seconds on Windows 10 using gfortran and Intel Fortran (specifically gfortran 12.0.0 20210718 from equation.com and Intel Fortran Version 2021.1 Build 20201112_000000), both using the -O3 option. The full code is here. Suggested speedups are welcome.

A trading firm may make markets in hundreds of thousands of options and must recalculate the option prices when the underlying stock prices move, so quickly calculating option prices and sensitivities (partial derivatives of option price wrt stock price, volatility etc.) is important to it.

I also have my BSM function in Fortran. I use the erf function to construct cumnorm. Not sure about the performance difference. But using the erf function makes my code much more concise.

No big difference between GFortran and ifort under Ubuntu 21.04 (Intel(R) Core™ i3-3220 CPU @ 3.30GHz):

$ ifort -O3 -stand:f18 kind.f90 black_scholes.f90 xxblack_scholes.f90 && ./a.out
 c =   4.75942232590152     
 time elapsed =    6.57875600000000     
$ gfortran-11 -O3 -std=f2018 kind.f90 black_scholes.f90 xxblack_scholes.f90 && ./a.out
 c =   4.7594223259015216     
 time elapsed =    6.4030679999999993 
$ gfortran-11 --version
GNU Fortran (Ubuntu 11.1.0-1ubuntu1~21.04) 11.1
[...]
$ ifort --version
ifort (IFORT) 2021.4.0 20210910  

Nice. On Apple M1 I am getting 1.8s with both GFortran 11.0.1:

+ gfortran -O3 -march=native -ffast-math -funroll-loops -c kind.f90 -o kind.o
+ gfortran -O3 -march=native -ffast-math -funroll-loops -c black_scholes.f90 -o black_scholes.o
+ gfortran -O3 -march=native -ffast-math -funroll-loops -c xxblack_scholes.f90 -o xxblack_scholes.o
+ gfortran -O3 -march=native -ffast-math -funroll-loops -o xxblack_scholes xxblack_scholes.o black_scholes.o kind.o
+ ./xxblack_scholes
 c =   4.7594223259015269     

real	0m1.853s
user	0m1.846s
sys	0m0.004s

and the latest LFortran master:

+ lfortran --fast -c kind.f90 -o kind.o
+ lfortran --fast -c black_scholes.f90 -o black_scholes.o
+ lfortran --fast -c xxblack_scholes.f90 -o xxblack_scholes.o
+ lfortran --fast -o xxblack_scholes xxblack_scholes.o black_scholes.o kind.o
+ ./xxblack_scholes
c =     4.75942232590152159

real	0m1.808s
user	0m1.801s
sys	0m0.004s

I had to apply a tiny patch as we still have to implement the cpu_time and also as a workaround for LLVM: Calling f(g(x)) in a loop can run out of stack (#573) · Issues · lfortran / lfortran · GitLab (which I really need to fix soon):

diff --git a/xxblack_scholes.f90 b/xxblack_scholes.f90
index 27dc187..45866a3 100644
--- a/xxblack_scholes.f90
+++ b/xxblack_scholes.f90
@@ -8,13 +8,15 @@ implicit none
 integer, parameter :: niter = 10**8
 integer            :: iter
 real(kind=dp)      :: c,t1,t2
-call cpu_time(t1)
+real(dp) s,k,r,t,vol
+!call cpu_time(t1)
 ! Example 15.6 p360 of Options, Futures, and other Derivatives (2015), 9th edition,
 ! by John C. Hull
+s=42;k=40;r=0.1_dp;t=0.5_dp;vol=0.2_dp
 do iter=1,niter
-   c = call_price(s=42.0_dp,k=40.0_dp,r=0.1_dp,t=0.5_dp,vol=0.2_dp) ! Hull gets 4.76
+   c = call_price(s, k, r, t, vol) ! Hull gets 4.76
 end do
 print*,"c =",c
-call cpu_time(t2)
-print*,"time elapsed = ",t2-t1
+!call cpu_time(t2)
+!print*,"time elapsed = ",t2-t1
 end program xxblack_scholes

The patch is really minimal, I am very happy about that! It gives the same answer as GFortran. I am also happy about the performance. Credit goes to LLVM, we don’t do any optimizations ourselves yet, but we will in the coming months. LLVM compiles code slowly (compared to what is possible, but LFortran compiles this in 0.224s compared to GFortran 0.342s, both with optimizations on), but the performance of the final executable is excellent. They have done an A+ job. So did GFortran, because they have implemented all this from scratch, and on this benchmark it’s essentially equivalent.

Update: I implemented cpu_time() in !1491 and !1492, so now the diff is just:

diff --git a/xxblack_scholes.f90 b/xxblack_scholes.f90
index 27dc187..e5df36c 100644
--- a/xxblack_scholes.f90
+++ b/xxblack_scholes.f90
@@ -8,11 +8,13 @@ implicit none
 integer, parameter :: niter = 10**8
 integer            :: iter
 real(kind=dp)      :: c,t1,t2
+real(dp) s,k,r,t,vol
 call cpu_time(t1)
 ! Example 15.6 p360 of Options, Futures, and other Derivatives (2015), 9th edition,
 ! by John C. Hull
+s=42;k=40;r=0.1_dp;t=0.5_dp;vol=0.2_dp
 do iter=1,niter
-   c = call_price(s=42.0_dp,k=40.0_dp,r=0.1_dp,t=0.5_dp,vol=0.2_dp) ! Hull gets 4.76
+   c = call_price(s, k, r, t, vol) ! Hull gets 4.76
 end do
 print*,"c =",c
 call cpu_time(t2)
2 Likes

And with fixing keyword arguments (!1498) and an alloca bug (!1494), this benchmark now works without any modifications!

$ lfortran --fast -c kind.f90 -o kind.o
$ lfortran --fast -c black_scholes.f90 -o black_scholes.o
$ lfortran --fast -c xxblack_scholes.f90 -o xxblack_scholes.o
$ lfortran --fast -o xxblack_scholes xxblack_scholes.o black_scholes.o kind.o
$ ./xxblack_scholes
c =     4.75942232590152159
time elapsed =      1.83060800000000001

@Beliavsky give it a shot. You have to use the latest master of LFortran.

Thanks for the update. Before on WSL2 I followed the instructions

git clone https://gitlab.com/lfortran/examples/mvp_demo.git
cd mvp_demo
conda create -n mvp_demo -c conda-forge fpm=0.4.0 lfortran=0.14.0
conda activate mvp_demo
fpm run --all --compiler=lfortran
fpm test --compiler=lfortran

to install. What .git should I clone to get the master? I tried https://gitlab.com/lfortran/lfortran.git , but when I do that I still cannot compile cpu_time(). lfortran --version says

LFortran version: 0.14.0
Platform: Linux
Default target: x86_64-unknown-linux-gnu

You can download the Development version from: Download - LFortran and follow the instructions from Installation - LFortran Documentation. So:

conda create -n lf python cmake llvmdev=11.1.0
conda activate lf
wget https://lfortran.github.io/tarballs/dev/lfortran-0.14.0-678-g95c94e7e.tar.gz
tar xzf lfortran-0.14.0-678-g95c94e7e.tar.gz
cd lfortran-0.14.0-678-g95c94e7e
cmake -DWITH_LLVM=yes -DCMAKE_INSTALL_PREFIX=`pwd`/inst .
make -j8
make install

This will install the lfortran into the inst/bin, in the current directory.

I just tested it and it works. The only difference to the online instructions is to pin the llvmdev=11.1.0 dependency, as we don’t compile yet with a newer LLVM versions.

Thanks, that works.

I get run times of 3.95 and 3.02 s for LFortran and gfortran. The script used is

exec=a.out
FC="../lfortran --fast"
FC="gfortran -O3" # comment out to use lfortran
rm -f *.o $exec
$FC -c kind.f90
$FC -c black_scholes.f90
$FC -c xxblack_scholes.f90
$FC -o $exec xxblack_scholes.o black_scholes.o kind.o
./$exec

I am surprised that gfortran is much faster here on WSL2 than plain Windows, on which the program takes 10s. Another program compiled with gfortran takes 1.0 and 0.67s on Windows and WSL2. I wonder if programs compiled with gfortran are generally faster on WSL2 than Windows.

1 Like

For some code, in my experience, gfortran on plain windows can be slower than gfortran on linux by a factor of 6 or so. I do not know why, perhaps somehow gfortran did not take full advantage of windows SDK when generating and linking the exe file?
However, the speed of gfortran on WSL2 should be basically the same as on linux, after all WSL is basically linux.

Perfect, thanks for trying it out! So on your computer you get LFortran 3.95s, Intel 3.4s, GFortran 3.02s?

Very cool.

1 Like

Using all compilers on WSL 2, I get

lfortran 3.99 s
gfortran 3.05 s
ifort 3.27 s
flang 3.38 s

The script used is

#!/bin/bash

exec=a.out
declare compilers=("../lfortran --fast" "gfortran -O3" "ifort -O3" "flang -O3")
declare sources=(kind.f90 black_scholes.f90 xxblack_scholes.f90)
declare objs=(kind.o black_scholes.o xxblack_scholes.o)
 
for FC in "${compilers[@]}"; do # loop over compilers
  echo && echo $FC
  $FC --version  
  rm -f *.o $exec # cleanup
  for src in "${sources[@]}"; do
      $FC -c $src
  done
  $FC -o $exec "${objs[@]}"
  ./$exec
done

giving output

../lfortran --fast
LFortran version: 0.14.0-678-g95c94e7e
Platform: Linux
Default target: x86_64-unknown-linux-gnu
c =     4.75942232590152159
time elapsed =      3.98533099999999996

gfortran -O3
GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

 c =   4.7594223259015216     
 time elapsed =    3.0451200000000003     

ifort -O3
ifort (IFORT) 2021.2.0 20210228
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

 c =   4.75942232590152     
 time elapsed =    3.26698100000000     

flang -O3
clang version 7.0.1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
 c =    4.759422325901522     
 time elapsed =     3.376587867736816

Are there other compilers to try on WSL 2?

Bash question – what is the recommended way to set objs, given sources. I just copied and pasted and replaced .f90 with .o.

Very cool. Would you mind please posting the benchmark to GitHub - fortran-lang/benchmarks: Fortran benchmarks ? Just create a directory for it and put in the source code + build scripts. Let’s start collecting such benchmarks and we’ll eventually create some nice infrastructure to run them with various compilers and options.

Can you please also try gfortran -O3 -march=native -ffast-math -funroll-loops? Those are the options I typically use. I didn’t see any difference on this particular benchmark, but I see differences on other benchmarks.

1 Like