Fortran: An ideal language for writing "Templates" of numerical algorithms (?)

My research group is initiating a project that aims to provide templates for a class of numerical optimization algorithms. (Disclaimer: “numerical optimization” is an overwhelmingly vast area, and we will primarily work on only a quite special class of algorithms, namely those falling in the same category as Powell’s methods without using derivatives; the implementations of such algorithms are notoriously complicated and involving.)

The motivation is to provide standard templates and building blocks for the implementation of some classical/mature algorithms, paving the way for extending the existing methods and developing new ones. We are inspired by Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, where the numerical linear algebra (NLA) community develops templates for basic NLA algorithms such as CG. By doing so, we practice the principle that coding means not only to get things done but also to document human knowledge (see the blog by @everythingfunctional titled Communicating with Whom?). Note that our objective is not at all to develop yet another optimization package. Instead, we aim to open the black boxes of optimization packages, show the standard implementation of such boxes, and provide standard modules for building new boxes.

Let me elaborate a bit on the word “template”. Here, a template for an algorithm refers to a combination of the following.

  1. A piece of pseudocode for the algorithm.

  2. A modulized and standardized implementation of the algorithm that faithfully corresponds to the pseudocode. The implementation should be easily readable and as similar as possible to the mathematical formulation of the algorithm. The purpose is to illustrate how to implement the algorithm for general problems. The code should be ready to use (of course) and can solve general problems of moderate size, but the performance is not the top priority. Others should be able to code the algorithm following this illustration if they need to adapt the algorithm to their particular problems.

  3. Subroutines that can be used as building blocks for implementing similar algorithms. The subroutines must be reliable (of course) and standardized, yet need not be optimized. Others can use these subroutines for a quick prototype of algorithms that they want to study/develop. The subroutines should follow the principle of high cohesion and low coupling. Each of such subroutines should have a clearly defined objective that is sufficiently elementary and sufficiently general (e.g., solving an upper triangular linear system, finding the Lagrange interpolatant of a function on a given interpolation set, etc), so that others can easily replace them with a new version when needed.

Why are such templates relevant — and indeed important?

While there are tons of papers on numerical optimization algorithms being published, not all of them come with working open-source code. Some of them do, but the code is often dirty (sorry), unmodularized, and unstandardized. Authors of such papers/code should NOT be blamed, however, because writing research papers and possibly research-oriented code is substantially different from developing software, and we should NOT require one (group of people) to do all the jobs well.

As a consequence, researchers and users of these algorithms often find themselves doing the following things.

  1. Spend a considerable amount of time on dirty and not-guaranteed-to-be-correct code by others.

  2. Struggle to reproduce numerical results by others, being not sure whether the performance difference is due to the mathematics of the algorithms or due to tricks/mistakes in the implementation.

  3. Reinvent/rebuild wheels because no standardized code or templates exist for the building blocks of the algorithms under consideration.

I myself have wasted too much of my life on these unpleasant and unproductive things, and I hope my project will help others (researchers/users) to save their time.

Now, here comes the big question and I would like to hear your opinion — what language(s) should be used for coding the templates? See points 2 and 3 in the elaboration on the word “template” given above.

It seems to me that (modern) Fortran is ideal for this job, not because I am posting in the Fortran discourse but because the following features of the language make it perfect for illustrative implementations of numerical algorithms.

  1. The code is thankfully similar to the mathematical formulations — that’s the name of the language.

  2. It supports matrices, vectors, and their operations natively. No need to write loops for basic matrix-vector calculations. This makes the code simple and readable.

  3. It supports attribute specifications such as intent(in/out) and parameter. The code can tell the readers (recall readability) clearly which variables are in/outputs, local variables, and constants.

  4. It is strictly and statically typed. By reading the code, people can figure out what is the type, shape, and size of each variable.

Recall that the major purpose of the template is to illustrate how to implement the algorithm for general problems. The performance for particular/challenging problems is not the priority. Therefore, each of the abovementioned features is critical, yet the high efficiency of Fortran is only a plus but not included in the list.

There are other languages that can be candidates, particularly MATLAB, Python, and Julia. But MATLAB misses points 3&4, Python misses at least 4 (I am not sure about 3 because I barely know the language), and I know almost nothing about Julia.

Fortran has cons, of course.

  1. One of the features that I dream about the most is for Fortran to support functions with multiple outputs. If Fortran had such capability, then the code would be more readable and more faithfully correspond to the math formulations. Of course, multi-output functions can be implemented as subroutines, but that is normally not how we formulate the functions mathematically — recall that we want the code to be as similar as possible to the math formulations. Another possibility is to return a structure, but it is again likely to differ from how we write the functions in mathematics — much more often than not.

  2. People, including professionals and researchers, have a misconception that Fortran is out of date and difficult to learn/use. This is not the language’s fault, but it is a considerable disadvantage, an expulsive force for others to accept your work.

  3. To be frank, due to the lack of interactivity, Fortran is more difficult to debug than scripting languages such as MATLAB and Python. Surely, Fortran is not alone here. It applies to all compiled languages. Fortunately, LFortran @certik is solving this problem. When will we be able to use Fortran in a way as interactive as MATLAB? For example, the keyboard command in MATLAB is extremely useful when debugging. When a Fortran programmer talks about the difficulty of the language, probably (s)he does not mean it is hard to write — it is indeed easy — but means it is hard to debug. Debugging takes a major part of coding in any language. Note that debugging does not only mean spotting programming errors such as out-of-bound arrays. That part can mostly be done by compilers, and Fortran has no disadvantage regarding it. When coding numerical algorithms, most of the debugging labor is to answer questions like the following (see the list below; how to insert a sublist under a list here?). They are more about finding mathematical mistakes in the code than locating programming errors. This kind of debugging demands the interactivity of the language (indeed, IDE for the language). Traditional debugging tools may help, but not always. The keyboard-like functionality is a killer feature here. With keyboard, you can easily examine what your code is doing, modify it, and continue to run it without recompilation. Tools like gdb can do similar things but are not yet as flexible/powerful/friendly as keyboard.

  • Why does the code run without any error but produce a wrong result?
  • Why does the algorithm work in theory but not in computation?
  • Why does a new version of the code produce a significantly different result from the last version while the change seems to be insignificant?
  1. Related to 3, most potential users of the code are scared by the fact that Fortran needs to be compiled before use. True, it is stupid to criticize a compiled language in this way, but note that human beings are timid, lazy, and often stupid animals. An extra step before seeing the code actually run will hinder half of the users; if that extra step reminds people about difficulties and uncertainties, the other half will probably escape, leaving only the most courageous ones (like you). What is your most primitive memory about “compilation”? I bet that it includes the hundreds of errors that your compiler spat/vomited to you after you hit the “enter” key to compile your very first program in your very first coding course. That is a nightmare about coding for most normal human beings. Acknowledge or not, it is scary.

What is your opinion? The views of Fortran experts like you will be highly helpful for my project.

[Update: Indeed, what we are looking for is a lingua franca for numerical computation — a language that is simple, straightforward, and strict. Fortran was deemed in that way, but evolution is needed for it to continue to be regarded as such. See posts by @FortranFan under Fortran returns to top 20 TIOBE index, Eliminate implicit mapping, and Fortran and Neural Networks.]

One more thing: Due to the above cons, I feel obliged to accompany each template coded in Fortran with a MATLAB/Python translation. Even though the latter two languages cannot describe the algorithms as precisely/clearly/strictly as Fortran, most people may choose to read/try them rather than Fortran. But it raises one more question: What is the best way of translating a piece of modern-Fortran code to MATLAB/Python (and vice versa)? Given the high similarity between modern Fortran and MATLAB, shouldn’t there be an automated engine for doing so? Of course, the translated code should be readable — it is translated for human beings, not (only) for machines.

Thank you very much.

4 Likes

I think that functions with multiple outputs in Fortran can be implemented by using subroutines. In my experience, it is confusing sometimes because the number of arguments increases. But usually it works. I’m not sure how C programmers usually deal with multiple-outputs functions. I usually return multiple outputs through struct or pass pointer to update variable in-place, the second one is similar to subroutines in Fortran and the first one is also not elegant.
BTW, I couldn’t agree you more about the three boring and annoying things in implementing numerical algorithms.

1 Like

Hello @CYehLu ! Thank you for the comments. I have updated my post according to your response, in the list of cons.

Interesting read, thanks for sharing (I have used MATLAB for about 15 years and never used or stumbled upon keyboard).

People, including professionals and researchers, have a misconception that Fortran is out of date …

This is rapidly changing, thanks to the activities of Fortranners on this forum and others.

due to the lack of interactivity, Fortran is more difficult to debug than scripting languages such as MATLAB and Python.

Fortran compilers are excellent in debugging, in my opinion. It suffices to compile the codebase with the NAG compiler to find all problems, or pass it through multiple compilers to catch virtually anything with simple human-readable messages. I often take the latter approach with gfortran and Intel ifort (they are both free and affordable for an individual like me). Of course, having an IDE that could catch syntactic errors before the compilation step would be great. But I often find them useless, even the fancy IDE of MATLAB.

most potential users of the code are scared by the fact that Fortran needs to be compiled before use.

The compilation step is a bit of work because the user has to keep in mind different compiler flags. But it is not so difficult to write a simple Unix Bash or Windows Batch script to automate this step. It certainly takes 10-100x less time to write a Bash script than to download multi-GB MATLAB, Anaconda, or Julia installation files and spend half a day installing them. I just measured the installation time and download size of gfortran: ~40 seconds, ~100Mb.

The facts that I like about Fortran, in addition to what’s been so far mentioned, are,

  1. the absolutely beautiful human-readable (yet, scalable) parallelism syntax (Coarray),
  2. the strict exact platform-independent, architecture-independent, timeless syntax of Fortran that enables writing codes that last decades,
  3. the fact that achieving the fastest runtime is no hassle, no cost, and is separate from the codebase: The codebase is only for communicating mathematical ideas. Achieving speed is the compiler’s job, unlike other languages like Julia or C++ that tend to mix the two issues.

The Fortran standards committee should be commended for what they have achieved and have made of Fortran to last 7 decades and yet remain among the top choices for formula translation.

5 Likes

Thank you @shahmoradi for the detailed response.

I agree with your point. However, I meant something a bit different when talking about “debugging”, which I failed to clarify. I have edited my post and you may see my elaboration on debugging code of numerical algorithms.

For the type of debugging specified in my post, interactivity of this kind is a killer feature. You can easily examine what your code is doing, modify it, and continue to run it without recompilation. Try keyboard, and I believe you will be amused.

1 Like

I don’t think this is true, a Fortran function with several outputs is just called subroutine with syntax call f(a,b,x) instead of a=f(x). Alternatively, if you want something of the form a,b = f(x) you can use a derived type that contains a and b.

Fortran also has the possibility to overload functions and subroutines if you mean with ‘several outputs’ return values of different types.

1 Like

Thank you, @MarDie, for commenting.

However, both subroutines and derived types make the code differ considerably and unnaturally from the mathematical formulation, as mentioned in my post. This is a true disadvantage (sorry) when coding “templates”, which is supposed to be as close as possible to the mathematical formulas.

The idea of making the code as close as to the mathematical formulas is the same as what we mean by “Fortran” in the first place, isn’t it? So I am not sure why Fortran does not support multi-output functions yet. The MATLAB-style syntax [a, b] = f(x) would be ideal.

No, I did not mean to overload functions to support multiple types.

1 Like

Thank you for pointing out this. It is surely important and I have not thought about it before.

Statements like [a, b] = f(x) rely on a tuple, at least in Python. I assume that the MATLAB functionality is implemented in a similar way. Unfortunately, types like lists, tuples, or collections are not part of the Fortran language. Therefore, one needs use a derived type if a and b are of different types or an temporary array if a and b are of the same type. Note that stdlib already includes a list type: Stdlib linked list by ChetanKarwa · Pull Request #491 · fortran-lang/stdlib · GitHub.

1 Like

There is

f2matlab: converts Fortran 90 code to Matlab m-files, by benbarrowes. Accordingly, only basic data types and constructions are recommended.

Recent books with a mission similar to yours are

Algorithms for Optimization (2019) (uses Julia)
By Mykel J. Kochenderfer and Tim A. Wheeler
MIT Press

Introduction to Unconstrained Optimization with R (2019)
by Shashi Kant Mishra and Bhagwat Ram
Springer

1 Like

Thank you @Beliavsky for commenting.

Yes, I am aware of f2matlab. It may be a good point to start with. One of the limitations is that it requires the program to be in the same file. This is hardly true for the code I have in mind.

In fact, I am thinking about using OCaml. Attempts have been made with successes to some degree. I am just wondering whether there exist mature tools than some dirty scripts written by myself/my student.

I will have a look at the books you mention.

1 Like

“Statements like [a, b] = f(x) rely on a tuple, at least in Python.” You need a tuple if you want to return multiple arguments of different types. Fortran could define the syntax for use with multiple arguments pf the same type as syntactic sugar for:

    c = f(x)
    a = c(1)
    b = c(2)

Mind you I would find that less useful than a tuple result.

I would also like to be able to return multiple values from a function, see the discussion here: Syntax of interactive Fortran - #18 by certik

We should create a standalone issue at the incubator repository to discuss this more.

I can see only one disadvantage: what if you use such a function in an expression? A short answer is that you would not be able to. The advantage is that it does not require any kind of a tuple type.

A long answer is that you could if we introduce a tuple type that would be the result. However, I would not, it seems it would open all kinds performance and design issues. Would it still be useful without a tuple type?

So a function that returns multiple values would indeed be similar to a subroutine. However, it is much more readable — you know exactly what are the “in” and “out” arguments. And since you cannot use a subroutine in an expression, it’s not a limitation that a function with multiple output values also cannot be.

1 Like

Hi @certik , great to get supportive comments from you and thank you for that!

In MATLAB, if f returns multiple values, say three, like

[a1, a2, a3] = f(x),

then we have the flexibility of calling f with one or two outputs, namely

[a1, a2] = f(x),
a1 = f(x).

If the function is used in an expression, then the first output, i.e., a1 will play the role.

1 Like

In Matlab, can the results be named in the function call, so that if the function defines outputs c1 and c2, one can write

[c1=a1, c2=a2] = f(x)

? If a function can return a variable number of outputs, they may not match what the user intended. With a subroutine one can write

call f(x,c1=a1,c2=a2)

For clarity, I think it’s a good idea to use keyword arguments when calling a procedure with optional arguments.

In a functional language the tuple could be used in an expression as an argument to a procedure. So you could have

… g( f(x) )…

where g is a function of more than one argument. However it would have a large impact on the standard. You would have to define procedure arguments in terms of “tuples”, where in Fortran “tuples” are defined in terms of keyword and optional arguments. It is certainly possible to define the use of keyword arguments as equivalent to tuple constructors and it may be possible to do the same for optional arguments, though the default arguments of other languages are easier to map to tuple constructors.

@certik, @milancurcic , etc. in this case, I notice a bothersome inconsistency in your statements at the other thread. On one hand, I see several comments in other threads that call for a simple, minimal language in Fortran; on the other hand, like with the case here, there is a call and support for features that really complicate matters relative to its foundation and legacy since FORTRAN I even.

The fact is in Fortran, a function result has certain characteristics that includes type, kind, and rank (TKR).

Returning multiple “values” with possibly even different type/kind/rank comes across as a change that runs orthogonal to the current standard and also a needless complication.

Should the use case be something like an eigensolver, the “canonical” Fortrannesque approach, especially considering all the primitive-appearing intrinsic functions in the language and also prior comments re: avoidance of object-oriented (OO) design and seeking functional paradigm, will be for a math library (say in stdlib?) to just supply eigenvalue and eigenvector methods:

lam = eigenvalue(A)
..
c = eigenvector(A)

The case for multiple value function results in Fortran appears rather weak, based on the use cases that have been reported thus far.

First, I’m glad you liked the blog post. And your project sounds interesting and commendable. I’m excited to see how it goes.

For functions with multiple outputs, I’ve generally taken to using derived types, but totally understand how it can in some ways obscure “the math”. @rouson often says that “software abstractions should resemble blackboard abstractions” (you’ll have to ask him where the original quote comes from), so my question is:

How do you write a function with multiple outputs on the blackboard? Is it a, b = f(x), or something else? If that’s what you want, then I do like Python (and other languages) destructuring/pattern matching syntax. For many use cases it would be useful “syntactic sugar” for something like the following to be automatically transformed:

function f(x) result(a, b)
  real, intent(in) :: x
  real :: a, b

to

subroutine f(x, a, b)
  real, intent(in) :: x
  real, intent(out) :: a, b
end subroutine

and then

a, b = f(x)

to

call f(x, a, b)

but you do of course run into the problem, what happens with

z = g(f(x))

perhaps

block
  real :: a, b
  call f(x, a, b)
  z = g(a, b)
end block

seems a reasonable and straightforward transformation. Do you think it’s doable in LFortran @certik ? If so I think it would go a long way towards a proposal to the committee (I would certainly be in favor of it).

I’m curious, in Matlab what happens? In Python you can do it after-the-fact like

z = f(x) # returns a, b, c
a, y = z
b, c = y

Yes, that’s doable. But there are other issues, what about g(1, f(x), 3) would it be equivalent to g(1, a, b, 3)? If so, then we have essentially invented the *args syntax from Python.

I would initially not allow such functions in expressions. But if we want to allow it, then all the above must be resolved.

That would be very ineficient, you would have to diagonalize the matrix twice. You always compute c and lam at the same time.

I view the function result as a variable, that you declare in result(a). So it’s a natural extension to allow more than one such variable.

Things would get complicated if we introduces “tuples”, but if we don’t, then the change seems small and easy to understand for a newcomer.

1 Like