I generally find different codes require different techniques, but if you have real confirmed measured data or known confirmed values a “regression” test comparing results to those values is invaluable; but may just tell you your values are wrong (or at a minimum changed); so incorporating other tests like unit tests can be much more valuable in detecting where the cause is; especially if you don’t like debuggers. For regression tests comparing values in a file it is nice to have the tests for floating point values being compared using some kind of tolerance measure. The simple numeric difference program numdiff is an example of a tolerant numeric difference test; there are several listed on the Fortran Wiki.
Other testing-related packages are becoming available as fpm
packages that you can easily use as a dependency so I have been finding lately that if I set up my build as an fpm package I can use routines for tolerant floating point repairs, unit testing and logging and report mechanisms and other current and emerging tools very easily.
If you have fpm set up that same repository (the GPF one) runs about 1 000, unit tests when you do an “fpm test” as it includes a unit test framework, and some modules that do statistics, floating point compares, and assertion tests that might be useful examples.
I have been meaning to put my unit testing package out there but there are already several others available so it has not been a big push; but one of the reasons I have stuck with my one is that mine allows for an external process to be called that in my environment builds an sqlite3 file which is used to create automated reports, but if I put the same tests out on github the same calls just write a simple ASCII text report, as running “fpm test” on many of the GPF-related packages such as “M_strings” provides.
You can see other tests such as the reference BLAS/LAPACK packages that do a very nice job. If anyone knows of some existing publicly-available packages with good tests I think it would be useful to list them here. Some testing schemes as set up with Jenkins or github are interesting, as are some language and compiler test suites.
Confidence testing is invaluable in allowing you to make quick changes to your codes. Timing tests and/or using profiling tools (GNU users can see gprof(1), for example) are invaluable for identifying bottlenecks, so if performance is an issue think about including performance tests as well. Even some (conditionally compiled) CPU usage and wallclock values can be useful, and perform a valuable service when combined with unit tests, as it lets you catch changes that impact performance as soon you introduce them in many cases.
So whether you grow your own or use a package a combination of unit/regression/timing tests can have big payoffs; the biggest being when you are working with code that you want to rapidly develop.
The easiest can be numeric libraries that you can do a regression test again know properties like mathematical functions or steam table properties. I was involved in several libraries generating material properties and in one case they were using a printed eight-inch thick reference manually and spot-checking a few thousand values by EYE (which is what lead to the numdiff(1) program I mentioned earlier the first time they asked me to be the one to do the checks!).
Statistics and graphics are some of the more overlooked tools in unit testing in my experience especially when random numbers and field measurements are involved (although some people have bit-repeatable pseudo-random number generators in their codes just so they can do solid regression testing -which can be a very good idea) but even without using other more accurate but sometimes costly methods the human mind is amazingly good at picking up data from a good old plot.
I mention the GPF resources as actual examples you can pull using fpm in a few minutes, but make sure to look at the Fortran Wiki for a list of tools and ideas available. Maybe some of the upcoming talks will make it onto fortran-lang or the Wiki, but I guess we will both be tracking the FortranCon presentation.
@Beliavsky has some nice lists in the Wiki and his github repository that are related that you do not want to overlook.
I forgot to mention one of my favorites is to prepare an input file for programs that read them and using a little program to randomize some of the input. Making sure your code responds well to that, producing good diagnostics for bad or questionable input can be nearly as important as making sure it is producing the right answers when given “correct” input.