Ten computer codes that transformed science

Beliavsky · January 20, 2021, 6:19pm

Ten computer codes that transformed science
From Fortran to arXiv.org, these advances in programming and platforms sent biology, climate science and physics into warp speed.
by Jeffrey M. Perkel
Nature
January 20, 2021

Now in its eighth decade, Fortran is still widely used in climate
modelling, fluid dynamics, computational chemistry — any discipline
that involves complex linear algebra and requires powerful computers
to crunch numbers quickly. The resulting code is fast, and there are
still plenty of programmers who know how to write it. Vintage Fortran
code bases are still alive and kicking in labs and on supercomputers
worldwide. “Old-time programmers knew what they were doing,” says
Frank Giraldo, an applied mathematician and climate modeller at the
Naval Postgraduate School in Monterey, California. “They were very
mindful of memory, because they had so little of it.”

The ten codes are

Language pioneer: the Fortran compiler (1957)
Signal processor: fast Fourier transform (1965)
Molecular cataloguers: biological databases (1965)
Forecast leader: the general circulation model (1969)
Number cruncher: BLAS (1979)
Microscopy must-have: NIH Image (1987)
Sequence searcher: BLAST (1990)
Preprint powerhouse: arXiv.org (1991)
Data explorer: IPython Notebook (2011)
Fast learner: AlexNet (2012)

Several of them through 1990 were probably in Fortran.

argenti · January 20, 2021, 10:32pm

Only one remark. The article statement ``Fortran, developed by John Backus and his team at IBM in San Jose, California’’ is inaccurate. As Backus himself said:

[…] the real work on the compiler got under way in our third location on the fifth floor of 15 East 56th Street in New York.
[J Backus, The History of Fortran I, II, and III]

Indeed, even stylistically, Fortran has the feel and look of an art deco building: massive, efficient, stylish, down to business.

In the Nature article, there is a poll at the end: we should all vote for Fortran and BLAS, because FFT is taking the lead

I wonder whether it wouldn’t be a good move to rebrand modern fortran: ``behold, the new Empire (or something awe inspiring) is here: with advanced OO, modular and parallel features that dwarf any other HPC competitor, solid as a rock, and fully compatible with all the most powerful codes written since the dawn of computing."

Beliavsky · January 20, 2021, 11:41pm

This was attempted with F, an effort lead by Walt Brainerd (who passed away in 2020, sadly), and promoted by him in a 1997 article Portability and Power with the F Programming Language. It was not that successful. I think Dick Hendrickson, an F collaborator, once wrote on comp.lang.fortran that everyone agreed a Fortran subset was a good idea but could not agree on which subset. ELF90 from Lahey was a similar effort.

I used F and ELF90 to force myself to move to modern Fortran from FORTRAN 77, which I used in graduate school.

argenti · January 21, 2021, 12:01am

Maybe it is destined not to work. That said, to be honest, F and ELF sound more de-branding than re-branding to me: hardly catchy names… Furthermore, when I started coding, F90 was already around. I must admit that, if I have always considered F77 as being annoyingly limited, I felt that, in comparison to the other languages, F90 was still underwhelming, an I-wish-but-I-can’t sort of language. Only with 2003 I got the impression that the language spread its wings. If co-arrays, teams, and image faults work as promised, I think the language starts to be not just with comparable features to C++ and lots of legacy code to lean on, but outright cool. With these premises (and the libraries you are developing), the new fortran may gain ground.

FortranFan · January 21, 2021, 12:45am

Great points, but please read this comment I made in another thread. As I list therein, the Fortran language standard still needs to include about half-a-dozen facilities to be viable for modern library and application development - note most of these are features that other languages finding applications in technical computing such as Ada have had for decades (at least since 1990s. The half-dozen items are not bleeding-edge or novel or esoteric features, just things that have proved over the years to enable better and more expressive and safe and efficient code in technical computing.

Without truly modern libraries i.e., as something which is more than a collection of subprograms, the fact is Fortran will hardly be a choice in many domains in scientific and technical computing even as it might find increasing usage in absolute terms while suffering from a greatly sinking fraction, relatively speaking, of application in the tremendously growing role of computational science and technology in all aspects of life and society.

milancurcic · January 21, 2021, 1:28am

Fortress (programming language) - Wikipedia, a language influenced by Fortran, has what I think is the best programming language name ever.

argenti · January 21, 2021, 2:12am

Fortress is a nice name for sure. In comparison to “java”, “python” and “rust”, that is. My personal perception, however, is that “fortress” transmits the feeling of “impenetrable”, and “on the defense”. More specifically, designed to fend off sieges from attacking armies. Even “Empire”, admittedly a carefree and megalomaniac name (“Join me and together we can rule the galaxy etc. etc.”), is a better name since at least it is expansive in its aim.

vmagnin · January 21, 2021, 8:41am

Voted!
Fortran is now taking lead!

themos · January 21, 2021, 10:51am

Surprised that Linux is not on the list, as it is at the heart of almost all research supercomputers.

vmagnin · January 21, 2021, 12:36pm

Or rather UNIX and UNIX like OS. But as an OS, it is a general purpose tool, not science oriented.

milancurcic · January 21, 2021, 2:45pm

I just voted for Fortran, the general circulation model (I’m professionally biased), and ipython/Jupyter.

Although the article doesn’t mention it, the GCM (numerical weather, ocean, and climate prediction) is one of the few (if not the only) scientific disciplines that is developed almost exclusively in Fortran, for existing and new projects alike.

I agree that Linux should have made the list. Although, yes, a general purpose tool, I think it has enabled and accelerated science as much, if not more than, ipython and Jupyter.

R_cubed · January 21, 2021, 2:56pm

I’m somewhat surprised that (Common) Lisp isn’t mentioned as being transformational for science. Like Fortran, Lisp has been in use for decades, in very challenging areas of computational science, including symbolic AI. IIRC there are at least 2 computer algebra systems (Maxima, Axiom) have been implemented in it, along with one of the first automated theorem proving systems (ACL2).

Pretty much all future languages have borrowed concepts from it, whether it is automated memory management via garbage collection, first order functions, and meta-programming.

billlong · January 21, 2021, 6:01pm

Very interesting. Thanks for posting. Just a few comments:

BLAS has largely been built into Fortran at this point, either as array syntax or language intrinsic functions.

BLAST was cool in its day, but better alternatives are available now.

The pervasiveness of Jupyter notebooks has led to a unfortunate side effect - a lot of computer folk are no longer able to correct spell the name of the planet Jupiter.

milancurcic · January 21, 2021, 6:12pm

An unfortunate and common misconception that I see often, more dangerous than not correctly spelling “Jupiter”: Many scientists confuse interactive and exploratory science with reproducible science. They praise Jupyter notebooks for publishing reproducible science. But actually Jupyter notebooks are not particularly reproducible. In fact they’re less reproducible than ordinary Python scripts. More on this by Owain Kenway here.

But for exploration and teaching I think Jypyter notebooks are fantastic.

argenti · January 22, 2021, 4:18am

In my experience, BLAS has always worked much better than fortran intrinsic. My memories are from some time ago, but as far as I can remember:

matmul was significantly slower than DGEMM (I could not believe it);
matmul crashed for certain sizes when DGEMM was fine;
when multiplying submatrices, matmul used to create temp arrays, whereas in DGEMM one can pass the left corner and LDA/B and everything works like lightening.
Since then, I avoid matmul anytime I have matrices larger than 10x10.

ivanpribec · January 22, 2021, 10:11am

At least with gfortran there are some flags which can help you here:

-fexternal-blas

This option will make gfortran generate calls to BLAS functions for some matrix operations like MATMUL, instead of using our own algorithms, if the size of the matrices involved is larger than a given limit (see -fblas-matmul-limit). This may be profitable if an optimized vendor BLAS library is available. The BLAS library will have to be specified at link time.

-fblas-matmul-limit=n

Only significant when -fexternal-blas is in effect. Matrix multiplication of matrices with size larger than (or equal to) n will be performed by calls to BLAS functions, while others will be handled by gfortran internal algorithms. If the matrices involved are not square, the size comparison is performed using the geometric mean of the dimensions of the argument and result matrices.

The default value for n is 30.

Edit: some information and timings is available in the Bugzilla thread: 51119 – MATMUL slow for large matrices

Beliavsky · January 22, 2021, 2:18pm

With enough talent and motivation, even assembly language is enough, but I am grateful for modern Fortran.

Tini Veltman (1931–2021): From Assembly Language to a Nobel Prize
by Stephen Wolfram
January 21, 2021

Any serious calculation in particle physics takes a lot of algebra. Maybe it doesn’t need to. But with the methods based on Feynman diagrams that we know so far, it does. And in fact it was these kinds of calculations that first led me to use computers for symbolic computation. That was in 1976, which by now is a long time ago. But actually the idea of doing Feynman diagram calculations by computer is even older.

So far as I know it all started from a single conversation on the terrace outside the cafeteria of the CERN particle physics lab near Geneva in 1962. Three physicists were involved. And out of that conversation there emerged three early systems for doing algebraic computation. One was written in Fortran. One was written in LISP. And one was written in assembly language.

I’ve told this story quite a few times, often adding “And which of those physicists do you suppose later won a Nobel Prize?” “Of course,” I explain, “it was the one who wrote their system in assembly language!”

That physicist was Martinus (Tini) Veltman, who died a couple of weeks ago, and who I knew for several decades. His system was called SCHOONSCHIP, and he wrote the first version of it in IBM 7000 series assembly language. A few years later he rewrote it in CDC 6000 series assembly language.

…

Beliavsky · March 28, 2021, 2:32am

This article gets into technical details of that compiler.

THE FORTRAN I COMPILER
The Fortran I compiler was the first demonstration that it is possible to automatically
generate efficient machine code from high-level languages. It has thus been enormously
influential. This article presents a brief description of the techniques used in the Fortran I
compiler for the parsing of expressions, loop optimization, and register allocation.
by David Padua
Computing in Science & Engineering
January/February 2000

Topic		Replies	Views
Why are Climate models written in programming languages from 1950?	26	2691	April 10, 2021
Why abandon Fortran for Linear Algebra?	46	7151	August 11, 2021
Climate model code is so outdated, MIT starts from scratch	36	2828	April 24, 2022
Glmnet migrates to C++	29	3407	December 8, 2021
What is the superiority of Fortran over alternative languages like Chapel or Julia?	49	9866	August 13, 2021

Ten computer codes that transformed science

Related topics