Julia: Fast as Fortran, Beautiful as Python

Hello and welcome.

You seem to be attempting to time the SUM intrinsic. My experience with it is that Intel’s SUM() is about 3 times faster that NAG’s and GNU’s (slightly different versions, I use ifort 2021.2.0, NAG 7.0 and gfortran 8.3.1). Perhaps you need to repeat the timing and clear caches in between.

It is also possible to write Fortran that does it faster still on a multi-core machine (another 3x) and with a more accurate result.

1 Like

Is it possible to use C interoperability to compile source files with two different Fortran compilers to create an executable? This could be useful if you know that one compiler is faster for part of the code or if you have object code for a numerical library such as NAG or IMSL that is compiler-specific (but would calling the library from another compiler violate the license?).

You would need two different runtime libraries to be linked in and I/O, multithreading etc. could be a nightmare. But for some simple things, you could get away with it. NAG Library products are compiler specific but sometimes the documentation says that they can be used with user code compiled with a different compiler. I would not recommend it. If you are unhappy with your vendor’s performance, write them a note!

Hi

The Fortran and C++ could also be rewritten to use openmp fairly easily.

Jane and I are writing some additional parallel examples at the moment.

Cheers

Ian

Hi Themos The intention was to look at the run time of a small example that does initialisation and summation in a variety of languages and with a variety of compilers. The intention was not to test the sum intrinsic, but rather to look at the differences between an interpreted language and some commonly used compiled languages.

3 Likes

@cmaapic thanks for your posts and welcome to the forum! I’ve been reading your compiler comparisons papers for a long time.

Reading Concise and beautiful algorithms written in Julia | Hacker News and browsing the associated code at GitHub, I wonder how the corresponding modern Fortran algorithms would look. Having adopted a ? ternary, the next step for Fortran is Greek letters and more mathematical symbols. Just kidding.

1 Like

It does look beautiful, although some of the unicode does not render correctly in my Firefox, so it becomes unreadable. The bigger issue is just like with Python, I struggle to understand what the code does. Take this example: GitHub - mossr/BeautifulAlgorithms.jl: Concise and beautiful algorithms written in Julia

using Distributions

function thompson_sampling(𝛂, 𝛃, apply; T=100)
    for t in 1:T
        𝛉 = rand.(Beta.(𝛂, 𝛃))
        x = argmax(𝛉)
        r = apply(x)
        𝛂[x], 𝛃[x] = (𝛂[x] + r, 𝛃[x] + 1 - r)
    end
    return Beta.(𝛂, 𝛃)
end

Due to the lack of types, I have no immediate idea how to use this function. You have to read the code to “guess” that alpha and beta are probably 1D arrays (?) and apply is a callback. What argument does apply accept? Well, it’s the return value of argmax, so it’s probably an integer. It took me a few minutes looking through the code to figure it out.

What is nice about Fortran is that you have to declare the interface for apply, so you would know right away what it accepts. The price that you pay is that it is more verbose.

7 Likes

It is also far from obvious what the “.(“ is all about or why there is a “;” after apply. They definitely made some odd syntax choices.

What we need is a more interactive Fortran with some some kind of generics/type inference/multiple dispatch that isn’t incredibly verbose. Then I don’t see the need for Julia anymore. :grinning:

2 Likes

The use of . for broadcasting is far from unique to Julia (it’s also used by numpy and matlab), and IMO it does by far the best job of it. Having an easy way of vectorizing code is way better than having to define scalar and vector (and matrix and tensor) versions of everything.

1 Like

Fortran has intrinsic and user-defined ELEMENTAL functions for that.

3 Likes

From what I’ve seen there’s a key difference which is that you have to decide whether a function is elemental or not when you write it. This means that for example, you can’t broadcast matrix multiplication without re-writing BLAS. (also, you need different names for the elemental and non-elemental versions)

1 Like

So, does Julia vectorize a compiled external library at the request of the user, or does it recompile the library every time to achieve the goal?

It’s somewhere in-between

using SpecialFunctions # for error function, erf

v = rand(10)
println(erf.(log.(v)))

will be lowered to Base.materialize(Base.broadcasted(erf, Base.broadcasted(log, v))) which basically turns into

y = similar(v)
for i in 1:10
    y[i] = erf(log(v[i]))
end

If log and erf inline (in the particular they don’t since they are too expensive), Julia might make further optimizations like unrolling the loop, vectorizing the functions, or removing bounds checks.

3 Likes

Please see this example where the key computation, the sine of a radian, was moved to an external library, cordic_sine: Simple summation 8x slower than in Julia - #44 by FortranFan.

Once that was done, note in all the subsequent scenarios, the ratio of compute times with Fortran vs Julia were around unity, meaning no real difference i.e., the specific advantages with an enthusiast-driven macro implementation toward vectorization and specific “fast” trig functions leading to the original 8x difference became inapplicable.

2 Likes

Sometimes the same function or operator has a different meaning when acting on matrices than elementwise.

julia> A = rand(2,2)
2×2 Matrix{Float64}:
 0.198773   0.747052
 0.0135511  0.777073

julia> exp.(A)
2×2 Matrix{Float64}:
 1.21991  2.11077
 1.01364  2.1751

julia> exp(A)
2×2 Matrix{Float64}:
 1.22747    1.23599
 0.0224203  2.18427

julia> A * A
2×2 Matrix{Float64}:
 0.0496342  0.729008
 0.0132238  0.613966

julia> A .* A
2×2 Matrix{Float64}:
 0.0395108    0.558087
 0.000183634  0.603843

julia> A ^ 3
2×2 Matrix{Float64}:
 0.0197449  0.603572
 0.0109485  0.486976

julia> A .^ 3
2×2 Matrix{Float64}:
 0.00785369  0.41692
 2.48845e-6  0.46923
1 Like

I’m not sure what you mean. exp.(A) and exp(A) do something different (elementwise vs matrix exponential). The dot is used to syntactically indicate whether the function is applied to the entire argument, or applied elementwise.

Just like having type information listed makes it easier to read code, this is also informative, i.e. we know that foo.(A) is elementwise and foo(A) probably isn’t (unless someone defined foo(A) to explicitly loop over A).

1 Like

I’m lost here by the combative tone your comments are striking.

In Julia, we can define A::AbstractMatrix{Float64} * B::AbstractMatrix{Float64} to be dgemm and A::Float64 * B::Float64 to be scalar multiplication, and then automatically A::AbstractMatrix{Float64} .* B::AbstractMatrix{Float64} is elementwise multiplication.

Thus we do need to define two different functions, but the “ELEMENTAL” behavior comes for free by just defining the elemental (scalar) version of the function.
Then syntactically, someone reading our code knows whether our function is applying elementwise, or applying to the entire argument.
Similarly, one knows syntactically if it is fusing with other ., so one can look at code and tell how many array temporaries are being created.

Then, if the same operator has a different meaning when applied to a certain container type – like a matrix – we are still free to add a definition for that container type, without needing to use other operators/function names.

My argument would only be specious if we had to explicitly define exp.() and .*.
The . is syntax. All we have to define is exp(::Float64) to get both exp(::Float64) (no dot) and exp.(::AbstractArray{Float64}) (elementwise on a matrix) or exp.((A::AbstractArray .- B::AbstractArray) .^ 2) (elementwise exponential of squared differences, without allocating any temporaries).

2 Likes

I wanted to see if I could implement what you are saying just to see how far Fortran is away from that. Take this with a grain of salt as a goofy example. You would typically use intrinsics and elemental functions as noted earlier.

program test
use, intrinsic :: iso_fortran_env

interface operator(.elem.)
    procedure elem
end interface
interface exp
    procedure exp_mat
    procedure exp_elem
end interface
interface operator(.star.)
    procedure star_mat
    procedure star_elem
end interface

type elem_ptr
    real(real64), pointer :: mat(:, :)
end type

real(real64) :: A(2, 2)
A = reshape([0.198773, 0.747052, &
             0.0135511, 0.777073], shape=[2, 2], order=[2, 1])
call print_mat(A)
call print_mat(exp(.elem.A))
call print_mat(exp(A))
call print_mat(A.star.A)
call print_mat(A.star..elem.A)

contains

function elem(x) result(y)
    real(real64), target, intent(in) :: x(:, :)
    type(elem_ptr) :: y
    y%mat => x
end function

function exp_mat(x) result(y)
    real(real64), intent(in) :: x(:, :)
    real(real64) :: y(size(x, 1), size(x, 2))
    real(real64) :: x_k(size(x, 1), size(x, 2))
    integer :: i, j, k, fact
    y = 0.
    do concurrent(i=1:size(x, 1), j=1:size(x, 2))
        if(i == j) y(i, j) = 1.
    enddo
    fact = 1
    x_k = x
    do k = 1, 8
        fact = fact*k
        y = y + x_k/fact
        x_k = matmul(x_k, x)
    enddo
end function

function exp_elem(x) result(y)
    type(elem_ptr), intent(in) :: x
    real(real64) :: y(size(x%mat, 1), size(x%mat, 2))
    integer :: i, j
    do concurrent(i=1:size(x%mat, 1), j=1:size(x%mat, 2))
        y(i, j) = exp(x%mat(i, j))
    enddo
end function

function star_mat(a, b) result(c)
    real(real64), intent(in) :: a(:, :), b(:, :)
    real(real64) :: c(size(a, 1), size(b, 2))
    c = matmul(a, b)
end function

function star_elem(a, b) result(c)
    real(real64), intent(in) :: a(:, :)
    type(elem_ptr), intent(in) :: b
    real(real64) :: c(size(a, 1), size(a, 2))
    c = a*b%mat
end function

subroutine print_mat(mat)
    real(real64), intent(in) :: mat(:, :)
    integer(int32) :: i, j, m, n
    write(*, '(i0,"x",i0," Matrix{Float64}")') size(mat, 1), size(mat, 2)
    do i = 1, size(mat, 1)
        write(*, '(22f15.7)', advance='no')(mat(i, j), j=1, size(mat, 2))
        write(*, *)
    enddo
end subroutine

end program
1 Like

Note that, in exp case (and in any other), one could do:

a = 1.0
A = ones(2,2)

exp(a) # scalar
exp(A) # exp of matrix

v = [ a, a, a ]
exp.(a) # scalar, element wise
exp.(A) # scalar, element wise

V = [ A, A, A ] # vector of matices
exp.(V) # Matrix exp on each element of V

You can do this with essentially any function in Julia, even if it is function defined in a library. In this case what is needed is one exp method defined for scalars and other for matrices. (Because of generics, for many other functions only one method is needed at all)

4 Likes