NAG has been developing a Test Suite for Fortran compilers since the 90s. We have almost 20000+ Fortran source files with just under 800k LOC. It is orchestrated with shell scripts.
There are some issues one needs to think carefully about at that scale.
-
Having a clear semantic scheme to manage command line options. A Rosetta stone document for the active compilers would be a great facility. Even better perhaps would be a new compiler wrapper with novel option-names that maps the intent of the option to specific compiler flags for each compiler under test.
-
The ability to run as much of the test suite in parallel. Some tests might deliberately exhaust the resources of the machine. The huge majority will be extremely light on resources. Separating those out will make for a sweeter experience.
-
Some compilers under certain options will fail huge number of tests simply because they don’t implement one or two features. You need a way of filtering out similar failures (or tests) so you can concentrate on the unexpected ones. The reporting must be amenable to further tooling.