The counter-intuitive rise of Python in scientific computing

ivanpribec · December 8, 2020, 1:38pm

I found this blog post interesting, even though I do not fully agree with the authors’ conclusion. I don’t think it is the agility of Python as a programming language itself, but the developed infrastructure of libraries, online documentation and learning resources, which enabled Bob to quickly find the Python kdtree in the first place. There is in fact a kdtree library available for Fortran. According to some resources it even delivers performance similar to mature C and C++ libraries. However, it lacks the nice webpage, proper unit testing, documentation, and user examples, which are needed so it could stand out.

Flipping the coin of the debate, we could also say it is the lack of Fortran developer resources (and developers themselves), particularly for “classic” computer science algorithms like searching, sorting, heaps, stacks and queues, tree construction, etc., which obstructs adoption of Fortran in some domains of computational science.

The previous weekend I followed a Python quadtree tutorial and was quickly able to adapt it to Fortran (code is available here). Usage of the tree is not any more complicated than in Python:

integer, parameter :: n = 140
real(wp) :: points(n,2)
type(qtree) :: tree
integer, allocatable :: idxs(:)

! Square quadtree box with center at (0.5,0.5) and width 1
call tree%init(0.5_wp,0.5_wp,1.0_wp,1.0_wp)

! Populate with random points
call random_number(points)
call tree%populate(points)

! Query indexes in rectangle centered at (0.75,0.75) and width 0.6
idxs = tree%query(boundaries(0.75_wp,0.75_wp,0.6_wp,0.6_wp))

! Query indexes in circle with radius 0.3 centered at (0.75,0.75)
idxs = tree%query_radius([0.75_wp,0.75_wp],0.3_wp)

Sample output:

A few months ago I also followed a gist by Jake Vanderplas and created a Fortran Balltree. Some errors remains in the query function ), but at the time the build process worked nicely:

I was planning to polish these some more and make them available in the future as an fpm package. It would be nice to find some collaborators!

awvwgk · December 8, 2020, 8:28pm

Sadly, the linked kdtree library seems to be not actively maintained anymore.

I can think of at least one of my projects which could make good use of such a library in production and would be certainly interested in helping out.

certik · December 8, 2020, 8:39pm

Part of the advantage of Python is that it is interactive. We will fix that for Fortran with LFortran.

Finally, Python has more mature tooling, allowing rapid prototyping. I strongly believe that such modern tooling can be implemented for Fortran also. But currently it does not exist, and that is why Python is no doubt easier to use, even for me.

But I think if we keep our current direction and momentum, this can be fixed. Python had realized its full potential. Fortran has not realized its full potential. If we can realize Fortran’s full potential, I think it will be a better language for scientific computing than Python.

ivanpribec · December 8, 2020, 11:24pm

The author of this code is Matthew Kennel. According to the publication where it was introduced (back in 2004!) (https://arxiv.org/pdf/physics/0408067.pdf) it is provided under terms of the Academic Free License. I have done some refactoring of it before in a private fork.

The Scipy library has a useful kd-tree which is described briefly in their Nature Methods paper (see Fig. 3 in https://www.nature.com/articles/s41592-019-0686-2). The latest version is actually written in C++:

The benchmark is to construct a tree of n = 10000 uniformly distributed points, and query the nearest neighbors for r = 1000 random vectors in m = 3, 8, and 16 dimensions. The exact benchmark setup can be found here: https://github.com/scipy/scipy/blob/10c8c8de491a607378d6dca7602b0a3ed440a4b4/benchmarks/benchmarks/spatial.py#L110-L139

I have repeated this test with the kdtree2 code from Matthew Kennel:

program kdtree_benchmark
  use kdtree2_module
  implicit none

  type(kdtree2), pointer :: tree
  real(kdkind), dimension(:,:), allocatable :: data, queries
  type(kdtree2_result) :: results(1)

  real :: t0, t1, t2
  integer, parameter :: m(3) = [3, 8, 16]
  integer, parameter :: n = 10000, r = 1000
  integer :: idxs(r), idx
  integer :: i

  do i = 1, 3

    write(*,*) "Dimension = ", m(i)

    allocate(data(m(i),n))
    call random_number(data)

    allocate(queries(m(i),r))
    call random_number(queries)

    ! Populate tree
    call cpu_time(t0)
    tree => kdtree2_create(data,sort=.true.,rearrange=.true.)
    call cpu_time(t1)

    ! Query random vectors
    do idx = 1, r
      call kdtree2_n_nearest(tp=tree,qv=queries(:,idx),nn=1,results=results)
      idxs(idx) = results(1)%idx
    end do
    call cpu_time(t2)

    write(*,*) "Tree build (s): ", t1 - t0
    write(*,*) "Tree query (s): ", t2 - t1

    call kdtree2_destroy(tree)  
    deallocate(data)
    deallocate(queries)

  end do

end program

Running the program on a >5 year old ThinkPad T530 with an Intel® Core™ i5-3320M CPU:

$ gfortran -O3 src/kdtree2_precision_module.f90 src/kdtree2_priority_queue_module.f90 src/kdtree2_module.f90 tests/time.f90 -o time
$ ./time
 Dimension =            3
 Tree build (s):    3.41600017E-03
 Tree query (s):    1.05800014E-03
 Dimension =            8
 Tree build (s):    4.14900016E-03
 Tree query (s):    1.39079997E-02
 Dimension =           16
 Tree build (s):    5.63300028E-03
 Tree query (s):   0.218334988

The 16 year old Fortran code delivers similar performance to the latest Scipy version!

Had Bob from the blog post known about it, he could have told those Python youngsters…

certik · December 9, 2020, 12:18am

Can you try compiling with gfortran -O3 -march=native -funroll-loops -ffast-math and post the timing results?

ivanpribec · December 9, 2020, 12:25am

$ ./time
 Dimension =            3
 Tree build (s):    5.34399971E-03
 Tree query (s):    1.52799953E-03
 Dimension =            8
 Tree build (s):    3.08299996E-03
 Tree query (s):    1.24450000E-02
 Dimension =           16
 Tree build (s):    4.65999916E-03
 Tree query (s):   0.195482999

Some small improvements for higher dimensions, but it also makes the 3d tree results worse. Apart from the Euclidean distance calculation there is not much floating point math involved in the tree. But I am sure there is some space for further performance tuning.

I also did not use the same random number seed, so that might also affect the results.

urbanjost · December 9, 2020, 1:56am

iI was intrigued with timing this using the current default parameters
in the fpm Fortran package manager so I made a Q&D setup for fpm
and ran it (on my very old laptop but with a new version of Red Hat 8)
with ifort, nvfortran, and gfortran. I must say the results were basically
the opposite of what I expected so I thought I would share them. I did
not finish setting it up for fpm so this is not a polished setup in
that regard but I think more than one thread doing I/O in ifort might
mean it is doing double duty (I did not check into the cause but I did
not alter the code so I do not think it is something I did).

It was interesting to see what effort it took taking a small project I
was not familiar with and re-packaging it to the fpm format. At least
for a small project it was not bad at all except not quite being sure
what to do with the documentation.

The LICENSE was very open (included for the original ktree package) so
for the curious:

#!/bin/bash
########################################################
SKIP(){
# fpm package at
wget http://urbanjost.altervista.org/REMOVE/ktree.tgz
tar xvfz ktree.tgz
cd ktree
}
########################################################
(
exec 2>&1
fpm run --compiler ifort
fpm run --release --compiler ifort
fpm run --compiler nvfortran
fpm run --release --compiler nvfortran
fpm run --compiler gfortran
fpm run --release --compiler gfortran
)| tee $(basename $0).out
########################################################
exit
########################################################

ifort

+ build/ifort_debug/app/ktree
Dimension =            3
Dimension =            3
Tree build (s):   1.3705999E-02
Tree query (s):   3.7389994E-03
Dimension =            8
Tree build (s):   1.3826996E-02
Tree query (s):   3.7319958E-03
Dimension =            8
Tree build (s):   2.1205008E-02
Tree query (s):   6.4843997E-02
Dimension =           16
Tree build (s):   2.1082014E-02
Tree query (s):   6.5044999E-02
Dimension =           16
Tree build (s):   3.3169001E-02
Tree query (s):    1.363189
Tree build (s):   3.3977002E-02
Tree query (s):    1.368700
+ build/ifort_release/app/ktree
Dimension =            3
Tree build (s):   5.2239895E-03
Tree query (s):   2.0000041E-03
Dimension =            8
Dimension =            3
Tree build (s):   5.2229986E-03
Tree query (s):   2.0080060E-03
Dimension =            8
Tree build (s):   7.9360008E-03
Tree query (s):   3.4193993E-02
Dimension =           16
Tree build (s):   7.9390034E-03
Tree query (s):   3.4123003E-02
Dimension =           16
Tree build (s):   1.3113007E-02
Tree query (s):   0.6742060
Tree build (s):   1.3014004E-02
Tree query (s):   0.6697240

nfortran

+ build/nvfortran_debug/app/ktree
Dimension =             3
Tree build (s):    3.7317991E-02
Tree query (s):    8.9461803E-03
Dimension =             8
Tree build (s):    5.5178881E-02
Tree query (s):    0.1370029
Dimension =            16
Tree build (s):    8.2104921E-02
Tree query (s):     2.857805
+ build/nvfortran_release/app/ktree
Dimension =             3
Tree build (s):    1.2266159E-02
Tree query (s):    4.1868687E-03
Dimension =             8
Tree build (s):    1.7192841E-02
Tree query (s):    6.0570002E-02
Dimension =            16
Tree build (s):    2.4964094E-02
Tree query (s):     1.122507

gfortran

+ build/ifort_debug/app/ktree
+ build/gfortran_debug/app/ktree
Dimension =            3
Tree build (s):    1.19459992E-02
Tree query (s):    3.73700075E-03
Dimension =            8
Tree build (s):    1.78890005E-02
Tree query (s):    6.33680001E-02
Dimension =           16
Tree build (s):    2.55430043E-02
Tree query (s):    1.31747389
+ build/gfortran_release/app/ktree
Dimension =            3
Tree build (s):    5.34000061E-03
Tree query (s):    1.76099967E-03
Dimension =            8
Tree build (s):    7.67700002E-03
Tree query (s):    2.68570017E-02
Dimension =           16
Tree build (s):    1.12170018E-02
Tree query (s):   0.501192987

I wanted to compare these times with the original parameters
in the Makefile but did not.
~
~
~
~
~
~
~

ivanpribec · December 9, 2020, 2:25am

The double output with ifort is kind of strange indeed. nvfortran also seems to be struggling.

Does opposite of what you expected mean faster?

urbanjost · December 9, 2020, 3:15am

Well, I typically find nvfortran the fastest with code in this category; I find ifort the most reliable for compiling up standard code (maybe this is not) but looked like I was hitting some kind of bug, …

On a different note It emphasized to me there is a need for allowing for custom build options with Fortran fpm but it proved that an fpm setup can be created relatively easily with small projects and is then theoretically easily distributed and used by others; albeit I did not actually make it into a proper repository on something like github
and do not intend to on my own.

Note I have had problems with larger projects with nvfortran and am frustrated by it not supporting real128 as many packages I have have general code in them with a version for integer, real, doubleprecision, … so in this case it built fine out of the box but was slow, which was a particular surprise.

Also note I actually started with a version that might not be the same as the one you used, as I had some bizarre download error with my old Firefox version with your link but found what appears to be the same code in the CFD repository. It said it was an unaltered copy. Don’t want to get off on too many tangents but if you have wget or even a browser you can grab that quick fpm setup and see if I did anything obviously wrong.

I was more interested in comparing your times to the base fpm parameters initially; but got distracted by the results and just thought they might be interesting to others. Since the hardware is very different comparing my times to yours is “Apples to Oranges” so if I want to do the same compile you did but with fpm that currently means I have to make a quick hack of fpm or do the build outside of fpm; but that might change soon and is an entirely different topic. Think how easy it would be for me to run your tests with your code on my hardware if it was a fpm package on a git server! Just a few lines of commands!

So I was actually just diverting myself on a lark with a different purpose than what really is being discussed here but within a few minutes I had a result I thought might be of interest. On second thought it appears too divergent! Seems like a nice project that would make a great fpm package though.

In a past lifetime this library would have been particularly pertinent to me but not now, but it woke up some dormant brain cells and really caught my attention. I hope you complete your proposed project!

billlong · December 9, 2020, 12:45pm

Fortran has been through the “latest shiny object” process before, with Java and C++. HR departments and managers want to hire programmers who are already trained. C++ was at one time the default language taught to college students. (Before that, Pascal, and then C.) C++ was replaced by Java. Now it is Python. The issue is not which language is best, but rather which language the colleges and universities teach. CS departments generally don’t teach Fortran at all (most of the faculty do not know the language).

zerothi · December 11, 2020, 2:58pm

We really have to be realistic here. @ivanpribec I don’t agree that fortran is as easy to program in as Python. There are pitfalls and in particular, users have to compile the code.

Consider person A, he develops some code in fortran and finds some weird behaviour. He wants to plot some of the data structures while running to get an idea of the flow.

If you haven’t already setup your own program to be linkable with X plotting library of fortran, then this is a nightmare to begin with (yeah, fpm will help), but problem is that users have to deal with things such as 1) adding use statements, 2) adding temporary variables, 3) fixing build scripts to add libraries for plotting, 4) be sure about types etc. passing down correct stuff.

Yeah, “block” is your best friend in this case!

But really, in Python, any body can just do 3 lines of code to get a functional plot.
Agreed that documentation, infrastructure around is a big plus in the community, but to say that Python and fortran are equally agile is simply not true.

What we surely should aim for is a thriving community, increasing package repositories and improving documentation whole-wide to make it fresh and shiny again. But even then. I still think that users will be more “productive” in scripting languages.

Not to bash, just to add to the thread

Also, being a supporter at an HPC site, I can safely say that the majority of scientists are not interested in learning a language deep enough to be fluent without having documentation in a tab nearby. They care about their end-goal; the problem at hand. They don’t even care about performance (as long as it takes some days).

ivanpribec · December 11, 2020, 3:56pm

Finally some responses which return to the source of the thread

I don’t think I made any statements about Fortran being easier than Python. Only that in the case of the story in the blog post, it was not the agility of Python that lead to triumph over Fortran for the specific problem, but the well developed community infrastructure. Of course you could argue that Python being easy and more accessible (and as @billlong mentioned, being taught at universities) is partially the reason why more libraries are developed in it.

I think there are pitfalls to using Python too. Being a Python user myself, I know I have struggled a lot with pip, conda, virtual environments, incompatibility between Python 2 and 3, wrong python paths, Python classes and decorators, etc. While Python can be used interactively in the interpreter or in a Jupyter notebook, I usually find myself putting things in a script, and running from a shell. This is probably just a consequence of not being bothered to configure an IDE, and being so used to the shell <-> editor cycle from Fortran.

Concerning use statements, Python has these too -> import. While it works great for pre-installed pacakages, I always need a documentation tab open to figure out how the folder structure relates to the import names in my own libraries. I’ve also had problems with absolute and relative import paths, when interacting with Python users doing developmnet in PyCharms on Windows.

While the dynamic typing in Python makes the language feel simple, I have had cases before where it introduced subtle bugs. A simple example that comes to my mind is

a = np.array([1,2,3])
b = a

where the array b is now only a pointer to a. Modifying elements of b will also modify elements of a. In my experience, explaining this to a beginner can be challenging. A second example is passing the wrong dummy variables in a function call. In recent codes I also see Python users specifying type hints which brings us half-way back to static typing like in Fortran.

But I agree that for the average scientific user, which is just interested in solving a problem with some existing algorithm implementation, it will be easier to accomplish in Python.

certik · December 11, 2020, 4:12pm

@zerothi, I think part of your comment mirrors what I wrote above: Fortran is currently not as easy to use as Python.

The part where we might differ is in what “might be” in the future: I think if we develop all the tools and compilers, Fortran might be as easy to use as Python. I understand if you are skeptical.

urbanjost · December 11, 2020, 5:47pm

Having used a Fortran-based routine called INSPECT() that when passed an array lets you interact with it using matlab-like commands both from text and stdin that allows for plotting; as well as writing the output to a process, as a NAMELIST file, and as a Matlab input file I know that it is not the language per-se but the easily available tools that can make a language feel simple. You can make system calls at the prompt so you can call other simple utilties that compare the NAMELIST file to previous dumps and give a statistical summary of the data as well. All Fortran except a binding to low-level graphics. Easier to use than any other utility I have personally tried with a built-in user manual for when I forget. It can also be called as a stand-alone program
that can replay the commands entered when called from the program if journaling was turned on. I mention it because I know python-only programmers that dump their data and call inspect(1). They do not care what language it was written in, just that it is easy to use. Python is in several ways a proof of how it is possible to make C/C++/Fortran easier to use as it is a scripting language that depends on those other languages for much of it’s functionality.

vmagnin · December 11, 2020, 6:08pm

Personally, I don’t use Python for scientific computing. Perhaps because I learned Python 15 years after Fortran… But ten years after learning Python, I have finally never tried things like scipy or numpy, which are probably wonderful tools. I love Fortran because it’s made for crunching numbers and it’s perennial (no problem to run a 25 years old code). When the CPU temperature is rising, I know it is for good reasons.

But I understand that Python is attractive and I love it too : you can do so much by writing so few lines… The syntax is sobre (like a Google search engine homepage…) And, the standard library is huge: https://docs.python.org/3/library/index.html
Nothing to install, just type “import this_library”… I am using Python for scripts, sometimes big scripts like in gtk-fortran (regular expressions…). And of course, an interpreter is cool when you want to quickly try some commands…

Of course, as stated by Ivan, you can also have a lot of troubles with some Python features…

Finally, I think the Fortran standard library could be a killer feature, then fpm, then a Fortran interpreter.

FortranFan · December 12, 2020, 3:09pm

I personally do think Fortran language itself also requires an urgent boost to serve well the scientific and technical computing arenas. And this is in addition to everything needed in its “ecosystem”: standard library, interpreter and other enhancements in tooling, etc.

The notion of Turing-complete may be too abstract and even too basic any longer for any practical utility with modern languages. Nonetheless, one can think of an even more abstract notion of feature-completeness that can be more practical and which can resonate intuitively with those coding-in-anger regularly in scientific and technical domains. This notion is decidedly dependent on the time period, the state-of-art in computing at a given instance in time. On this count, Fortran has perennially fallen short of the needs of practitioners since ANSI FORTRAN publication back in 1978. And the deficit continues to this day.

I had previously noted 6 items in this thread which are crucial for immediate survival of the language as a legitimate contender in the scientific and technical computing space.

However I do believe there is a list of miscellaneous items in the syntactic sugar category that can make coding in Fortran far more safe, straightforward, and appealing and enjoyable, especially for scientists and engineers who should be and want to be domain experts in their respective fields, and not language masters:

Why are we still required to include implicit none in every scope in Fortran? Particularly in INTERFACE .. END INTERFACE constructs that has its own scope and which made it into the language much after the legacy with IMPLICIT statements. This is especially pernicious, particularly with any respect to diversity and inclusivity which is all the rage these days with social justice. Because the common refrain is “I don’t mind adding that statement” in all my codes, so nor should you - this is inherently disrespectful. Well, there is a vast group of scientists and engineers who are either unable or unwilling to follow the discipline in terms of this verbosity. And the cost of any indiscipline can be prohibitively expensive, particularly in commercial environments. And continuously excluding the inclinations of those wanting to advance to a modern state where the desire to be explicit with types is the default “value proposition” from language design considerations is forcefully ignored not only belittles any concept of type safety but disrespects the most precious commodity, the time of practitioners.
There are so many domains now where 100% of the computing is performed using IEEE floating-point arithmetic. Then, why are we still required to go through a long preamble with one’s own KIND module that sets the desired floating-point KINDs for use with IEEE formats. Fortran can do with built-in aliases, say ieee_fpb32, ieee_fpb64, etc. (or something less verbose) that coders can immediately employ toward floating-point computations with an intuitive grasp and confidence in the types.
Same applies to other intrinsic types and their kinds, especially INTEGER and CHARACTER with `c_char’ kind.
And of course, a built-in string type. Sure, initial experimentation in a standard-library years after ISO_VARYING_STRING is a decent reset, but it is the base language itself that needs this facility to be included intrinsically.

The rate of change with all of these features in standard Fortran is either too slow or zero. Fortran is headed to science museum archives unless things change for the better at a much faster pace.

Beliavsky · December 12, 2020, 4:42pm

FortranFan:

Why are we still required to include implicit none in every scope in Fortran? Particularly in INTERFACE .. END INTERFACE constructs that has its own scope and which made it into the language much after the legacy with IMPLICIT statements. This is especially pernicious, particularly with any respect to diversity and inclusivity which is all the rage these days with social justice. Because the common refrain is “I don’t mind adding that statement” in all my codes, so nor should you - this is inherently disrespectful. Well, there is a vast group of scientists and engineers who are either unable or unwilling to follow the discipline in terms of this verbosity. And the cost of any indiscipline can be prohibitively expensive, particularly in commercial environments.

FortranFan’s attempt to make IMPLICIT NONE a political issue is strained.

Do people agree with me that the standards committee should make implicit typing obsolescent? Every compiler will retain it, but people who insisted on implicit typing would have to use a std=legacy option.

Beliavsky · December 12, 2020, 5:22pm

I’m not suggesting that gfortran or any other compiler remove the ability to compile code with implicit typing, just that users who still want this feature should use a compiler option that acknowledges their code is not standard-conforming. A large fraction would already be using the std=legacy option for reasons Steve Kargl described.

Sometimes I forget to use IMPLICIT NONE and get syntax errors that puzzle me. I doubt I am the only one. Making IMPLICIT NONE the default would save people writing new code a little time.

ivanpribec · December 12, 2020, 5:55pm

At least for new codes such errors can be prevented on the level of tooling.

As an example using the new Fortran package manager:

~/fortran$ fpm new test_fpm
 + mkdir -p test_fpm
 + cd test_fpm
 + mkdir -p test_fpm/src
 + mkdir -p test_fpm/test
 + mkdir -p test_fpm/app
 + git init test_fpm
Initialized empty Git repository in /home/fortran/test_fpm/.git/
~/fortran$ cd test_fpm/
~/fortran/test_fpm$ cat src/test_fpm.f90 
module test_fpm
  implicit none
  private

  public :: say_hello
contains
  subroutine say_hello
    print *, "Hello, test_fpm!"
  end subroutine say_hello
end module test_fpm

Or using code completion upon pressing tab (in my case using Sublime Text):

implicit_none1

implicit_none2

The code completion templates can be customized up to ones preferences.

FortranFan · December 12, 2020, 6:35pm

It’s NOT political, but social - a fundamental difference, please make an effort to understand - see this also: [J3] nailed to the church door. The social considerations underpin everything, increasingly the functioning of ISO and ANSI and their committees also.

Progress on any effort is almost always smacked down with “we’ve always done it this way, that’s how it should remain, we don’t mind continuing with it, we don’t think it’s trouble for us” and then tacitly conveying to take it or leave it. And people will leave.

The resistance to making implicit none the default in Fortran is but a small (and some will think trivial) example yet it’s a classic illustration of which one can see the whole picture of also what holds the language back.

Converse to the cliched expression “you’d me at hello,” countless scientists and engineers are lost to Fortran at implicit none itself, often the first section in an introduction to the language. And like with this story, many a battle is lost.

Topic		Replies	Views
A Modern Fortran Scientific Programming Ecosystem Announcements	8	1866	October 17, 2022
Discussion of Fortran on Hacker News	29	2735	January 6, 2024
Scientific Fortran Lectures Advocacy	5	970	June 5, 2024
Parallel Programming with Coarrays in Fortran (blog post)	5	763	April 1, 2024
New Fortran programmer saying hi (and some thoughts on advocacy)	18	1075	June 2, 2024

The counter-intuitive rise of Python in scientific computing

ifort

nfortran

gfortran

Related topics