Is there a good Fortran package for SDE, DDE and ODE?

Would you be willing to update the code in the benchmark to

condition1(u, t, integrator) = (t-0.5) * (t-0.5005)
cb1 = ContinuousCallback(condition1, nothing; rootfind=true,save_positions=(true,false), abstol=0.001)

const Mask2 = Ref(false)
function condition2(u, t, integrator)
    if Mask2[] == false
        if u[1] < 0
            -1.
        else
            u[2]
        end
    else
        -1.
    end
end
function affect2(integrator)
    Mask2[] = true
end
cb2 = ContinuousCallback(condition2, affect2; rootfind=true,save_positions=(true,false),abstol=1e-9)

const Mask3 = Ref(false)
function condition3(u, t, integrator)
    if Mask3[] == false
        if u[1] > 0
            -1.
        else
            u[2]
        end
    else
        -1.
    end
end
function affect3(integrator)
    integrator.u = X0
    Mask3[] = true
end

and update the README to reflect the conclusions? (2x difference to same speed depending on the julia solver you use). When benchmarking vs competitors, I think it is important to make fair comparisons. (Imo, using the StaticArray version would be better, but that’s a less big deal).

Yes, I will update the script to use static arrays as this definitely improves the performance of Julia code. Rest of your changes hardly have any effect or even negative as I showed in the numbers earlier. Of course, we should all do fair comparisons, that’s why I pointed out the issues in your comparisons earlier. I am planning to expand my benchmarking codes to include some more differential equation problems and once I have this new code added with new benchmarking numbers, I will update my results (with and without event callbacks) in my repo. So stay tuned. And you may find some comfort in knowing that for Lorenz system, Julia Tsit5() solver seem to be at least 2x faster than FLINT’s DOP54 (which is Dormand-Prince 5th order or RK45 in Matlab) method in my preliminary testing, but more to come later.

1 Like

The changes to the callbacks, as I showed above are performance critical. The original code is type unstable and refers to non-const global variables. The new code does not, and as a result is 3x faster.

May I ask perhaps an irrelevant and naive question.
It seems many times if I need to increase Julia’s speed, I need to use some packages like StaticArray or some macros. Is it possible to just write native Julia code such than I can just reach the speed of using StaticArray?
I guess StaticArray is written in Julia but it uses many macros so that it can make the speed fast for small arrays. However, this optimization process is not easy for regular users, so they make this package as a wrapper.
Wish Julia’s compiler can automatically do more optimization.
In Fortran usually I do not need to find packages to boost the performance for some basic operations, because the compiler usually did a good job.

I feel the main strength of DifferentialEquations.jl is that it is package of many different algorithms.
I feel if those algorithms are properly written in Fortran, the performance should be similar with Julia.
Many speed difference are coming from the difference in the algorithms. Perhaps, to some extent, a fair comparison might be to use the same algorithm, and compare Julia vs Fortran.

Anyway, good wishes to Julia.

PS. Is the Julia source code for all the algorithms in differentialequations.jl publicly available? I checked GitHub - SciML/DifferentialEquations.jl: Multi-language suite for high-performance solvers of differential equations and scientific machine learning (SciML) components
But it seems I cannot find the source code.

A StaticArray is just Tuple with some arithmetic defined on it for convenience. Arrays are mutable objects and store their length as runtime data, so they will never be 0 overhead. Julia is giving you more safety here, so it takes a little bit of work to make that safety have 0 overhead.

All the algorithms in DifferentialEquations.jl are public. They just are spread across a few repos. Check out GitHub - SciML/OrdinaryDiffEq.jl: High performance differential equation solvers for ordinary differential equations, including neural ordinary differential equations (neural ODEs) and scientific machine learning (SciML) and GitHub - SciML/DiffEqBase.jl: The lightweight Base library for shared types and functionality for defining differential equation and scientific machine learning (SciML) problems for more of the code. Also @which can be very useful to see where specific methods are defined.

1 Like

You could relatively easily rewrite the basic solvers in Fortran. That said, Fortran isn’t powerful enough to easily deal with things like sparsity of different structures, different input types (eg DoubleFloat, Float16), symbolic simplification, and all the fancy stuff like that. To rival the Julia ecosystem, you would have to impliment a good type system with generics and dispatch, a symbolic algebra library, and basically the rest of a more versatile programming language (Greenspun's tenth rule - Wikipedia).

1 Like

Read this note I have: Mutability ¡ JuliaNotes.jl

It was written from the perspective of a Fortran user, dicusses the differences from the user point of view, and at the end shows how to write your own “static arrays”. Fortran indeed does make the user life easier for new users, particularly by optimizing things which, in Julia (still) allocate a new intermediate array at every iteration:

julia> function f!(x)
           for i in 1:1000
               y = [ x[1], x[2] ] # this allocate a new intermediate array at every iteration
               x .= x .+ y
           end
           return x
       end
f! (generic function with 1 method)

julia> @btime f!([1.0,2.0])
  27.807 Îźs (1001 allocations: 93.84 KiB)
2-element Vector{Float64}:
 1.0715086071862673e301
 2.1430172143725346e301

This kind of thing is a frequent cause of very poor first experiences with Julia for people coming from other languages (and one place where a StaticArray, or a tuple, or a custom struct, or just not computing at all the intermediate array, are solutions). That specifically I think that will get optimized at some point in the development of the language (because it is possible to prove that that intermediate array is local to the scope of the loop and, thus, can be stack allocated). But Julia does not do miracles, you have to pave your way into it to write good code in it, as in other languages. I have said this many times, though, in Fortran there is less chances you get a very poor performance to start with. The dynamical character of Julia allows one to write very slow code.

Now, I would suggest you be kind to ask these things on the Julia forum, because it becomes somewhat offtopic here. (also it can be a suggestion for the forum managers here, to split the thread to an “offtopic” section, I think for future forum users that will be better, particularly because the OP is very important in general for all users).

4 Likes

I have enabled static arrays and some of the changes (Ref values) you are suggesting for event callbacks in my FLINT repo here.

As I had mentioned earlier for this problem (Arenstorf orbit) with or without event callbacks, I am seeing Julia diffeq solvers are slower than FLINT. See my latest results on github.

You (or anyone interested) are welcome to optimize the Julia test script that is used for these tests and submit your changes in my repo directly.

Thanks so much for being so responsive about this! I’ve made a PR Make mask2 and mask3 const by oscardssmith · Pull Request #4 · princemahajan/FLINT · GitHub to change the Ref variables to const which is what gets rid of the type instability in the code (and thus provides the speedup).

2 Likes

No problem, we all want to see the results more than the claims. I doubt adding const will make it match FLINT numbers as the difference seems to be significant for this problem but I will accept it into the main branch (I think I had removed const originally because it gives those redefinition warnings on repeated runs).

it makes a really big difference because without it being const, globals in Julia are type unstable, which means that you have to do dynamic dispatch, and everything just gets much slower.

I get that. What I meant was it still won’t be enough to make it faster than FLINT numbers (again for this problem).

3 Likes

You can check the latest benchmarks. I have even wrapped the entire Julia script in a main() function to call from REPL directly instead of including the script as the former method is guaranteed to (JIT) compile everything on the first run.

3 Likes

Thanks for updating the benchmarks! One last question, in the readme it says you are benchmarking vs DifferentialEquations 2.0. If so, you should definitely update to the most recent version (6.19).

1 Like

Readme probably is not up to date. These latest tests were using DiffEq v6.16. The main thing that improved Julia code performance was when it was run interactively from REPL instead of doing include script every time. Still, the flint solvers can be up to ~5x faster when event detection is on and I think there is still potential for more optimizations in flint code.

3 Likes

Thank you, it is really nice! @prince_mahajan

I just checked, using Intel API + Visual Studio 2017,
I just just include the test files

plus all the f90 files in your src repo,

and it compiles and run! Nice!

One question, is it possible to provide a Makefile,
so perhaps we may just include your test f90 files + all the src f90 files, and do a make, and run?
With the makefile, perhaps we can know the structure/dependence a little easier, once the Makefile works on Linux, I can easier do a makefile for windows, too.

Thank you very much indeed!

cmake is not necessary but it will generate make files for you. I find it easier this way. cmakelist file is included in the package.

Yes, you can always just include the f90 files directly in your project and compile it that way.

Btw if you find any bugs (and am sure there are!) please submit them on the repo.

1 Like

Thank you very much!
I am currently not very familiar with cmake, but anyway, I think it is easy to make a makefile, too.

Just to make sure, the flint makefile if I were to do it, it does not depend on other external library, right?

Eg, I have a makefile, which compile all the f90 or f file in the folder,

#$Id: Makefile.nucmatop,v 1.4 2013/12/11 19:36:53 nuclear Exp $
MPI=false
MPI=true

ifeq ($(MPI),true)
   FC=mpiifort 
   MPIFILE=mympi
   EXEC=v6pimcmpi
#OMP= #-openmp -lpthread -openmp-report
FFLAGS= -O3 -xHost -traceback -heap-arrays  # -fast # -O2
F77FLAGS=$(FFLAGS)  -r8 # intel fortran
#LDFLAGS=-L/opt/intel/mkl/lib/intel64
#LIBS=-lmkl_intel_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_thread -lmkl_lapack95_lp64 -lmkl_core -liomp5 -lpthread #-check all -warn all
#MATRIX=matrixlapack
else
   FC=gfortran
   MPIFILE=nompi
   EXEC=v6pimc
#OMP= #-openmp -lpthread -openmp-report
FFLAGS= -O3 # -O2
F77FLAGS=$(FFLAGS) -fdefault-real-8 -fdefault-double-8 # gfortran only.
#MATRIX=matrixlapack
#LDFLAGS=-L/usr/local/lib
#LIBS=-llapack -lrefblas -ltmglib
endif



.SUFFIXES:
.SUFFIXES: .o .f .f90

.f90.o:
	$(FC) $(FFLAGS) -c $<

%.o: %.mod

OBJECTS=\
he4v6pimc.o\
av18pot.o\
v6stepcalc.o\
brut.o\
cheft.o\
chorizos.o\
estimator.o\
estimatorristra.o\
lattice.o\
lj.o\
math.o\
$(MPIFILE).o\
pot.o\
ran.o\
step.o\
v6pot.o\
volume.o\
wavefunction.o
   
he4v6pimc: $(OBJECTS)
	$(FC) -o ./$(EXEC) $(OBJECTS) $(LIBS) 

clean:
	rm -f $(EXEC) *\.mod *\.o *~ 

he4v6pimc.o: ran.o brut.o $(MPIFILE).o step.o\
	v6stepcalc.o estimator.o estimatorristra.o wavefunction.o math.o he4v6pimc.f90
	$(FC) $(FFLAGS) -c he4v6pimc.f90

av18pot.o: av18pot.f
	$(FC) $(F77FLAGS) -c av18pot.f

v6stepcalc.o: $(MPIFILE).o math.o brut.o v6pot.o v6stepcalc.f90
	$(FC) $(FFLAGS) -c v6stepcalc.f90
	
brut.o: $(MPIFILE).o ran.o math.o cheft.o v6pot.o brut.f90
	$(FC) $(FFLAGS) -c brut.f90

cheft.o: cheft.f90
	$(FC) $(FFLAGS) -c cheft.f90

chorizos.o: $(MPIFILE).o chorizos.f90
	$(FC) $(FFLAGS) -c chorizos.f90

estimator.o: $(MPIFILE).o estimator.f90
	$(FC) $(FFLAGS) -c estimator.f90
	
estimatorristra.o: $(MPIFILE).o estimator.o estimatorristra.f90
	$(FC) $(FFLAGS) -c estimatorristra.f90

lattice.o: lattice.f90
	$(FC) $(FFLAGS) -c lattice.f90
	
lj.o: volume.o lj.f90
	$(FC) $(FFLAGS) -c lj.f90	
	
math.o: math.f90
	$(FC) $(FFLAGS) -c math.f90		
	
$(MPIFILE).o: $(MPIFILE).f90
	$(FC) $(FFLAGS) -c $(MPIFILE).f90

pot.o: pot.f
	$(FC) $(F77FLAGS) -c pot.f

ran.o: ran.f90
	$(FC) $(FFLAGS) -c ran.f90

step.o: chorizos.o math.o wavefunction.o v6stepcalc.o ran.o brut.o $(MPIFILE).o\
  estimator.o estimatorristra.o lj.o step.f90
	$(FC) $(FFLAGS) -c step.f90

v6pot.o: cheft.o $(MPIFILE).o v6pot.f90
	$(FC) $(FFLAGS) -c v6pot.f90

volume.o: volume.f90
	$(FC) $(FFLAGS) -c volume.f90

wavefunction.o: $(MPIFILE).o math.o lj.o v6stepcalc.o brut.o estimator.o wavefunction.f90
	$(FC) $(FFLAGS) -c wavefunction.f90

For flint, it is similar, right?
I just need to check your src + test files, and find the dependence and I can write a similar makefile, right? Sorry another lazy question.

Btw, your code looks nice! I can learn a lot from your code! Thank you very much!

Yes there is no dependency at all. Just include its f90 files in you makefiles.

I can probably upload my visual studio project or mingw makefiles created by cmake on github when I get back to my computer.

Yes, I tried to document the ODE algorithms well, glad you find it helpful.

1 Like

Thank you very much!
When you have time, if you could add a makefile it could be great!
It is something like the one in @certik repo,

But anyway, I think I can figure it out. Eg, instead of check the code and do it manually, I can use a IDE software called codeblocks, it can generate a makefile which will show the dependence of each file.
Anyway, great job!
Thank you both @prince_mahajan @oscardssmith