How should a Fortran compiler test suite identify non-standard code?

Fujitsu has released their compiler test suite for C, C++, and Fortran. It has 7034 .f90 and 920 .f source files. Looking at their first Fortran code

program main
integer*8 i8,j/-20/,k/20/
integer   kei/0/
do i8=j,k
  kei=kei+i8
end do
if (int(kei)==0 ) then
  print *,"OK"
else
  print *,"NG",int(kei)
endif
end

raises the question of how a test suite should handle non-standard code. Since things like real*8 are non-standard but ubiquitous, they belong in a test suite, since a practical compiler needs to support them. Besides identifying a code as non-standard, compilers have options to check that a code conforms to earlier standards (F90, F95, F2003, F2008, F2018). A test suite in which the source files are annotated with comments to say what standards it satisfies would be useful. This could be done automatically by compiling a source file with various options and seeing what fails. By doing this with multiple compilers one could get a list of codes where compilers disagree about the interpretation of a standard.

ETA: Looking at the code a bit more, it has (kind=8) in many places. This is so common that a practical compiler needs to support it, but the standard does not specify that 8 is a valid KIND number, and I believe the NAG compiler requires an option support this KIND number. So the automatic translation of a compiler test suite to code that all compilers should support by default (replacing hard-coded kind numbers with parameters set by SELECTED_REAL_KIND or SELECTED_INT_KIND) is also something that should be done.

2 Likes

@Beliavsky’s Fujitsu code has some nonstandard initializations including
integer kei/0/
Neither ifort nor ifx complained even though I had used the -standard-semantics option. Gfortran and AMD flang gave warnings about the initializations but ran the program, printing OK. But g95 flagged the syntax error. I haven’t had access to a Fujitsu compiler for over 40 years, when I was surprised to be told by the computer centre running an IBM mainframe that the Fujitsu compiler was better than IBM’s own.

-standard-semantics doesn’t enable warnings - what it does is change the meaning (semantics) of certain usages to match what the current standard says. These are cases where earlier standards did not specify behavior, and the implementation was different. What you want instead is -std (/stand on Windows).

My view is that a test suite that is billed as testing Fortran should not include any non-standard usages. Obviously, a vendor would also want to test their extensions.

2 Likes

I think that one should take the “reference-output” files with a clove of garlic.

For example, Test 0001_0062 calls subroutine SUB4 on line 395 with an integer constant as the first argument. However, on lines 413 and 415 in the subroutine, that argument is set equal to the second and third arguments, respectively. This is an error that may or may be caught at compile time or when the program is run. However, the reference output simply says “OK”.

Such behavior leads one to ask, “What is the test program supposed to do?” If the program has “undefined behavior”, and that defect is not detected, should any output be designated as “OK”?

Whether standard code or not a compiler ideally does not produce an ICE, so there is something valuable in detecting such errors whether the input is standard or not. A number of failures can be from the same cause so the magnitude of ICE is not necessarily significant but the test suite produces 21 ICE from gfortran and 40 from ifort.

2 Likes

For invalid (not standards conforming) code, the test suite should do one of a few things.

  1. If it violates a constraint, require the compiler to diagnose the error at compile time and produce a meaningful error message (an ICE does not count)
  2. If it only violates normative text, but should be detectable at compile time in this case, expect the compiler to produce a meaningful error message at compile time, but not consider it a failing test if it does not (again, an ICE does not count)
  3. If it only violates normative text and isn’t really detectable at compile time, expect it to successfully compile the code, and expect it to produce a meaningful error message at run time, but not consider it a failing test if the program terminates normally (and definitely an ICE would not count)

In response to my suggesting that non-standard codes be marked as such, a Fujitsu employee replied,

Yes, as you say, separating non-standard tests will improve the value of this test suite.
We Fujitsu are now concentrating on publishing our internal test suite to the Clang/Flang/LLVM community. This still requires a lot of work, including removing Fujitsu compiler-specific features and internal information. Once the whole test suite is published and integrated to the LLVM test-suite, we or someone could work on identifying non-standards programs if the community wants it.

Compiling with flags checking standards, for example -fsyntax-only -std=f2018 -Werror in the case of Flang, could be the first step to identify non-standard syntax though the accuracy depends on the ability of compilers. It would be so helpful if you have any idea to identify them.