C Is Not a Low-level Language

I found the perspective of the following article interesting:

C is Not a Low-Level Language: Your computer is not a fast PDP-11.
David Chisnall, ACM Queue, Volume 16, Issue 2, 2018

Also available as a PDF. Discussion on HackerNews available here.

Given that most current Fortran compilers have a companion C processor, and the processors currently in use typically share parts of the back-end, is the abstract machine behind Fortran any closer to hardware or not?

Anecdotally, at an OpenMP training event given by Michael Klemm (Director of the OpenMP Architecture Review Board, formerly employed at Intel, currently at AMD) I recall some discussions where Fortran was compared favourably to C as far as OpenMP is concerned. The gist was the simpler language semantics of Fortran, make it easier to guarantee OpenMP directives will do the right thing (and for programmers too).

On the other hand, if I’m not mistaken, the NAG compiler actually transpiles Fortran to C, and calls a C compiler to produce the binary. Is this simply for improved portability and easier development (it’s easier to transpile Fortran to C than having to write and maintain several assembly back-ends)?

Edit: I fixed the missing link…

1 Like

I have exactly the same question. I once heard a colleague saying that Fortran cannot be faster than C because Fortran codes are transpiled to C codes before being executed. This makes me wonder whether Fortran intrinsic functions/subroutines are written in C.

There are several StackOverflow threads which give (partial) answers:

As some of the answers summarize, it is highly compiler dependent. Those of us who work exclusively with gfortran and Intel Fortran tend to forget there are several other compilers including Cray, AMD, ARM, Absoft, etc. each of which might be optimized for the specific platform it runs on.

I would prefer if this thread does not turn into a Fortran vs C discussion. The article above tries to highlight a different issue, which is that the design of the C language and modern hardware has diverged.

As far as I can understand, the Fortran standard has followed a philosophy of “abstracting away the machine” as put succinctly by the title of the book by Mark Jones Lorenzo. But given that many Fortran compilers today have evolved in parallel with C compilers (or rather vice-versa) I was asking myself to what degree do the criticisms from the article apply to Fortran too.

Appentra is a company which develops the Parallelware Analyzer, a program for static code analysis. In the knowledge section of their website they provide several guidelines for improving code performance.

One of the knowledge items is

PWR003: Explicitly declare pure functions

demonstrated by the following code example:

#ifdef __GNUC__
  #define PURE __attribute__((const))
#else
  #define PURE
#endif

PURE int foo_pure(int a, int b) {
  return a + b;
}

int foo_impure(int a, int *b) {
  *b = a + 1;
  return a + *b;
}

It is trivial to rewrite this in Fortran without the need for vendor-specific compiler attributes.

The intent, and pure/impure attributes make it easy to explicitly specify purity and intent. I was positively surprised recently when reading the book Functional Programming in C++ which spoke highly of pure functions and how C++ has been evolving in this direction. So has Fortran.

I believe that many projects with both Fortran and C compilers often compile languages to an intermediate representation before generating machine code. That’s different from transpiling Fortran to C. In general, the implementation language of a compiler or interpreter does not dictate the speed of resulting programs. C is faster than Python, but PyPy, implemented in Python, often runs programs faster than CPython.

2 Likes

I read the article, I think it is good. The Hacker News discussion (linked above) is also very good.

I would say the Fortran language is closer to the math and numerical algorithms (and thus even further away from the hardware) than C. So in principle it should allow compilers to optimize it better due to the reasons mentioned in the article.

It seems to me based on my subjective experience with Fortran compilers that they do not take full advantage of the language, and also the language itself being overly restrictive (so the optimizations done by -ffast-math are against the Fortran standard). I feel there is a huge potential for Fortran to be optimized really well. It’s one of our goals in LFortran to implement good optimizations before we lower to LLVM and/or C (we have both backends).

3 Likes

C is sort of a high-level assembler. The Fortran compilers I have worked on were written largely in BLISS, but C is more common now. The libraries are often C but sometimes assembler is used. It is true that, nowadays, Fortran and C compilers from a given vendor share a code generator and optimizer, but the languages are very different – each has semantics incompatible with the other, so it isn’t the case that C is a least common denominator.

I do know that when Intel started work on an LLVM-based Fortran compiler, LLVM was hopelessly inadequate for Fortran and needed major structural rework to enable Fortran semantics. It’s a lot better today.

4 Likes

This reply from Bill Long in another thread also speaks of C as a substitute of assembler:

1 Like

Steve, do you know what year Intel started working on the LLVM-based Fortran compiler? I would be curious to know.

I’m going to say 2016 (my last year there). It wasn’t a major effort until 2017 or so. I remember hearing complaints at about how C-centric LLVM was and that it simply couldn’t represent some Fortran concepts at the time.

3 Likes

Thanks! I didn’t know Intel worked on it since 2017. Do you know by any chance what kind of Fortran concepts were hard to represent in LLVM back then? I actually started prototyping LFortran around 2017 also, just to see if it is possible. I didn’t see any major blockers, but I was new to compilers.

1 Like

Sorry, I don’t recall. You could ask Lorri Menard, since it’s her project, and see what she says.

2 Likes

This is a nice article. For RISC architectures, C is still a relatively good representation, but Fortran compilers for ARM and RISC-V are perhaps less mature. For engineering and scientific computing, benefits from code reuseability and numerical correctness guarantees typically outweigh improvements from low level performance optimizations due to the long code lifetimes and relatively small developer teams. Heterogeneous accelerator hardware makes the link between programming language extensions for C and Fortran and the hardware less close, but eases adoption and code porting.

Just curious. Are there any references for this claim?

I don’t think this is true. High performance RISC cores still have out of order execution, pipelining, complicated memory models, multi-threading, vectorized instructions and the rest.

Ok, yes - memory models, out of order execution and pipelining are tricky. Vectorized instructions and multi-threading have a long history in Fortran due to their use for mathematical calculations but also are not in a PDP-11.

These were just some comments in-passing from Michael Klemm. Since I didn’t take any notes, I would not like to misinterpret his words or pull them out of context. The materials from that workshop are available here: LRZ: OpenMP-Workshop. (You can find me in the group picture.)

In the meanwhile I’ve tried to find some evidence in support of my statement. Here are a few bullet points:

  • Historically, the OpenMP 1.0 standard was released in 1997 for Fortran (see slide 2). The C version came 1 year later. The original partners in the architecture board such as Compaq / Digital, Hewlett - Packard, Intel, IBM, Sun Microsystems, etc. are all Fortran compiler vendors. One of the motivations for OpenMP was the lack of standardization in previous shared-memory parallelization frameworks, such as High-Performance Fortran. In some cases like the Hitachi Fortran 90 compiler, OpenMP directives were mapped to directly to proprietary directives.
  • Browsing through the OpenMP specification v5.1, I’ve found a few items where Fortran differs
    • array sections, which are intrinsic to Fortran, but have to be specified judiciously for C/C++ (added only in OpenMP v4.0);
    • the workshare construct which is exclusive to Fortran and also supports array assignment, WHERE construct, and array intrinsic functions including MATMUL, DOT_PRODUCT, SUM, PRODUCT, MAXVAL, MINVAL, COUNT, ANY, ALL, SPREAD, PACK, UNPACK, RESHAPE, TRANSPOSE, EOSHIFT, CSHIFT, MINLOC, and MAXLOC;
    • the reduction clauses where Fortran also supported max and min besides the common arithmetic, logical, and bitwise operators (implicit max and min reductions for C/C++ were added first in OpenMP v3.1).
  • in this blog-post from 2018 - The Need for Speed Part 2: C++ vs. Fortran vs. C | Strange Attractors, the same algorithm had a slight performance advantage in Fortran + OpenMP, compared versus C and Rcpp versions (in all cases gcc 4.9.3 was used, released in 2016). It could be that recent updates to gcc and/or Rcpp would change the result.

On the other hand, in the article “The Ongoing Evolution of OpenMP” there appears to be a slight bias towards C and C++, presumably because of the larger industry interest in those languages. The “tasking” concept appears to have originated from C/C++ prototype implementations of OpenMP. Also the implementation of recent OpenMP standards has lagged behind in Fortran compilers, AFAIK. Even the OpenMP v5.0 standard released 2018, mostly targets the Fortran 2003 as the base language.

Overall, the OpenMP architecture board has done a great job to design a parallel framework which is independent from the base language design.

3 Likes

I’ve found one more example where I believe Fortran has a slight semantic advantage, although I haven’t attempted to confirm this. It is the saxpy example taken from the LRZ OpenMP Workshop, with offloading to an accelerator.

The C code might look like this:

void saxpy(float a, float* x, float* y,
int sz) {
  #pragma omp target map(to:x[0:sz]) \
                     map(tofrom:y[0:sz])
  for (int i = 0; i < sz; i++) {
    y[i] = a * x[i] + y[i];
  }
}

The issue here, is the C compiler cannot automatically determine the size of the the x and y pointers. Instead, the programmer must help the compiler explicitly using mapping attributes. Without the mapping, a compiler might decide not to offload if it cannot verify what is behind the pointer.

On the other hand, in the Fortran code

subroutine saxpy(a, x, y, n)
  use iso_fortran_env, only: real32
  integer :: n, i
  real(kind=real32) :: a
  real(kind=real32), dimension(n) :: x
  real(kind=real32), dimension(n) :: y
  !$omp target
  do i=1,n
    y(i) = a * x(i) + y(i)
  end do
  !$omp end target
end subroutine

the compiler can infer the array size directly from the dimension attribute. The direction of transfer is also identified automatically (x is not changed - to, y is changed - tofrom). I imagine a compiler could also check the direction of transfer versus the manually declared intent attributes for extra safety.

In essence Fortran arrays (may) help the compiler to make the right choice.

1 Like