Using reserved words as variables

That may be true in isolated examples, but in general there are many useful features in fortran that were not available prior to f2003. Those important features include such simple things as allocatable arrays that are dummy arguments, allocatable scalars, allocatable components of a derived type, and all of the C interop features. Those are all language features that would affect how code is written and also the efficiency of the resulting code. And if you were concerned about writing portable code that ran on multiple compilers using those features, then you still had to wait until f2005 or f2006 to freely use them. So making this kind of statement about 20+ year old code is less credible than making it about 10 year old code.

I wrote a lot of code in the 1980s, and while this was often the case, there were also many cases where, after a rewrite for a vector machine (a Cray, a Cyber 205, and ETA-10, a Convex, an FPS processor, etc.) that code would then also perform better on a scalar machine than it did originally. In hindsight, one could usually explain why it performed better, but in some cases it was simply a mystery.

However, there were, and still are, situations where machine-dependent code is necessary for optimal performance. For just one example, consider a matrix-matrix product written explicitly with do loops. On a Cray (and several other vector machines), the optimal loop order has a SAXPY type innermost do loop, while on most scalar and even RISC machines, the optimal loop order has an SDOT type innermost loop. A sophisticated compiler might recognize that loop structure and rearrange it if necessary, but that was not really typical. That is why programmers of that era relied so much on conditional compilation and preprocessors, and why it was so frustrarting that the standard committee failed to incorporate that capability into fortran all through the 1980s and also failed even in the f90 revision. Even today, as you mentioned, with the various types and generations of vector hardware and GPUs available, there is still the need to write machine-dependent code.

LAPACK is designed so that its efficiency derives from the underlying BLAS operations, not so much on the optimization of the high-level code itself. For example, if you compile the fortran reference LAPACK codes, and then link to an efficient BLAS library (hand coded assembler, or tuned fortran/C), you will get optimal performance, far beyond that available with the fortran reference BLAS. When it applies, that is a good general model for other programming tasks, but it doesn’t always apply to every application. At one time, LAPACK did not even include high-level optimizations such as OpenMP directives, although I think that is no longer true for the latest LAPACK versions.

[edit: link added in reference to multithreaded LAPACK:
Does LAPACK/BLAS automatically use multi cores or threads? - #27 by JeffH]

I agree with this in general, but I thought OpenBLAS was written in C, not assembler, and I thought the tuning/optimization was done automatically, not manually. Whatever C does, I think that code could be written in fortran in principle, but fortran lacks a standard preprocessor, so such tuning tasks are easier in C for those superficial reasons.

1 Like