The Fortran Testing Framework, FTFramework, a collection of Python and Bash scripts by Agustin Forero to enable easy testing of Fortran compilers using the Bash Automated Testing System (BATS), may interest compiler writers.
Interesting. @rouson and I have been considering writing our own test suite to automate the annual ACM SIGPLAN Fortran Forum compiler survey. This looks like something we should look into, for inspiration if nothing else. Thanks for posting.
FWIW, I have used the compiler survey from the Fortran Forum to assemble my own collection of test programs, the goal being to see if particular language features were available with the compiler. See flibs - a collection of Fortran modules / SVN / [r428] /trunk/chkfeatures. A similar set but with emphasis on the diagnostics is at flibs - a collection of Fortran modules / SVN / [r428] /trunk/chkdiagnostics
Yes, see also this issue + discussion:
and this GSoC proposal:
This should be done (one way or another).
If one or more members from the Community can take the lead to put together a repository or a temporary “parking lot” for tests to be considered toward inclusion in a “testsuite for the standard”, I can post several dozen for initial consideration (it can get into hundreds if I can get better organized and pull from support incidents submitted with a commercial vendor): just a few samples of the kinds of tests I have assembled can be reviewed at the above GitHub issue thread:
This repository starts as a collection of reproducers from many of the Fortran compiler bug reports I have made over the years. It comprises the more recent bug reproducers for contemporary compilers. My initial goal was simply to gather them to a central canonical location, and make them accessible to a few interested people. I also wanted to be able to easily run the tests and clearly see which tests are passing and failing.
Note that the number of tests for a given compiler is a reflection of my level of engagement with the compiler, and should not be construed as an indication of its quality. Quality, from the perspective of my usage, is measured more by the number of failing tests.
The development of comprehensive test suites for specific Fortran 2003 and later features, such as deferred-length allocatable character variables or parametrized derived data types, is a possible future goal. I’m fed up with Fortran compilers that claim support for a feature, but have a half-assed implementation that only works for simple usage and fails badly for more complex usage. A test suite that could thoroughly probe an implementation would help shed the light of reality on the claims.
Contributions of tests or suggestions for improving the usefulness of this project are most welcome – make a pull request.
Thanks to Brian Friesen, I have funding to contribute tests to flang. I’ll be glad to contribute my tests to a broader community effort as well. Before we get too far, it would be really great to reach some consensus regarding style, test infrastructure, file organization, etc. My initial thoughts are
Use the Vegetables unit testing framework so that the tests can be written in the form of a specification,
Keep test files orthogonal so that each file tests a relatively small, specific language feature or feature set (e.g., SUM or CO_SUM),
Include at least a one-line, FORD-style comment per program unit, derived type, and procedure or procedure interface, and
Group the files into subdirectories according to sections of the standard that the files aim to test.
Most importantly, all tests should report success or failure before terminating. (This is implied if one uses Vegetables.) I should be able to push some examples to a fork of flang next week for further discussion.
Hi Damian, yes, if you could make the tests available to a broader community in a compiler independent manner, that would be extremely helpful. Let me know how we can help.
While I agree your your other points, I’m not sure vegetables is the right tool for this job. While it does encourage writing tests in a specification style (which I highly recommend), and nicely reports the results, there are some aspects of this test suite for which it may not be well suited.
A test basically involves (trying to) compile a program, run it if it succeeds, and verify the behavior was correct. It may be that the code isn’t supposed to compile (i.e. verify the compiler enforces some constraint), should crash (i.e. a run-time error condition is detected), or executes to completion.
There is also a bootstrapping problem for vegetables. If a compiler doesn’t support all the features needed for vegetables, you can’t use it to compile the test suite runner. Also, vegetables isn’t really set up to be a parameterized test suite (i.e. selecting which compiler to use at run-time).
I think this is a case where a more purpose built framework would be appropriate. I think the use case is simple enough that it wouldn’t be that hard.
- Organize the tests in a hierarchical structure
- Specify whether they should compile, and if so whether they should execute to completion
- Allow designating what compiler to use when executing the suite
- Be written in something that is cross-platform (Python comes to mind as a good candidate)
The biggest question that I have is, how do we allow a given compiler to designate that it needs certain flags to be used in order to pass a given test? Or that an executable must be executed in some special way? Coarray features immediately comes to mind, but I believe there may be other special cases as well.
Once we have a framework, I have no doubt we will be able to collect lots of test cases from the community. I think @FortranFan’s idea of drawing from bug reports is probably a good idea too.
NAG has been developing a Test Suite for Fortran compilers since the 90s. We have almost 20000+ Fortran source files with just under 800k LOC. It is orchestrated with shell scripts.
There are some issues one needs to think carefully about at that scale.
Having a clear semantic scheme to manage command line options. A Rosetta stone document for the active compilers would be a great facility. Even better perhaps would be a new compiler wrapper with novel option-names that maps the intent of the option to specific compiler flags for each compiler under test.
The ability to run as much of the test suite in parallel. Some tests might deliberately exhaust the resources of the machine. The huge majority will be extremely light on resources. Separating those out will make for a sweeter experience.
Some compilers under certain options will fail huge number of tests simply because they don’t implement one or two features. You need a way of filtering out similar failures (or tests) so you can concentrate on the unexpected ones. The reporting must be amenable to further tooling.
Thanks for you input, Brad. I agree with your thoughts.
Regarding compiler flags, FWIW, most compilers eventually evolve away from using special flags to support standard features. For example, the Cray compiler used to require a special flag to support coarray features, but they eventually removed that requirement. I would hope that all compilers eventually evolve in that direction. At least given that gfortran and flang are open-source, we could potentially make it so. That might also influence the commercial vendors to follow our lead, but I recognize that this doesn’t help immediately.
@rouson that’s great news. Does the funding mandate that this be an internal component of Flang or can the suite be its own subproject, usable by Flang and other compilers like GFortran or LFortran? If it can be at least framed as somewhat compiler-agnostic project, I think there will be a much greater buy-in from the community, and thus a higher-quality test suite.
We have a regression test-suite for fpt with about 8000 files. This contains many non-standard constructs e.g. the MPX and VMS HP-UX and HP3000 extensions. This is perhaps important because ifort, CVF and gfortran (with switches) support many of them, and we would like to know where they won’t work. Parts of this could be made available (parts are proprietary). Please let me know if this would help.
Hi @Jcollins thanks for the post and welcome to the forum. Anything that you can make open source under some permissive license (MIT or BSD or Apache) would be a huge help.
I would like to create a standalone, compiler independent project probably at fortran-lang for such tests that the whole community can develop and maintain.
Agustin here. If you have any questions on the functionality of the framework, please do let me know, and I will be happy to answer them as soon as possible. Thank you.
Hi @gforero, thanks for the message and welcome to the forum. I might be interested in using your framework for LFortran pretty soon after we release MVP.
What a difference a year makes! I believe we could publish the tests separately, but the past year has taught me that writing a comprehensive standards-conformance test suite is a humungous endeavor.
We have focused strictly on semantics tests for parallel features of Fortran 2018. After devoting roughly .25 person-years of work, we’ve finished collective subroutines, synchronization, cobounds inquiry functions, and image enumeration (
num_images()). Our tests total over 1400 lines of code, not counting blanks but counting special comments that have meaning in flang’s unit testing framework. By the time we finish just Fortran’s parallel features such as coarrays,
event_type, atomic subroutines, failed images,
critical blocks, locks, and possibly a bit more, we’ll likely blow past .5 person-years and these are just tests that the compiler accepts a comprehensive range of standard-conforming syntax and rejects non-conforming syntax where the standard requires. We’re not even doing any runtime testing yet because LLVM
flang currently parses Fortran 2018 syntax, but doesn’t yet produce executable files from Fortran 2018.
Here are three examples from my recent work with Kate Rasmussen, who started with Berkeley Lab earlier this month and is funded roughly half-time to work on tests for
Also for a recent effort at developing reasonably comprehensive runtime tests for one small Fortran 2003 feature, type finalization, see the compiler_tests
module in the reference-counter repository. @wyphan, @everythingfunctional, and I wrote those tests. I would estimate that was one person-week of work and it’s only testing runtime behavior for one feature that’s a small part of the standard. It’s not even testing compile-time behavior such as parsing and semantics-checking.
I’m not at all surprised that @themos commented earlier in this thread that the NAG test suite clocks in at 800k LOC and has been under development since the '90s. As a very rough estimate, I’d imagine that developing a comprehensive compiler standards-conformance test suite is O(10) person-years for someone who knows the standard well.
@kargl thanks for taking a look and for the great suggestion. I’ll discuss your idea with Kate.
My “chkfeatures” collection of programs in flibs.sf.net (flibs - a collection of Fortran modules download | SourceForge.net for the code) merely checks whether certain syntactical features from the various standards are supported and it sometimes takes a lot of ingenuity to write such a program that clearly demonstrates the feature I try to check only one particular feature per program …