Are there “smart” file comparison tools that will say that two corresponding lines of output files
are within a tolerance that the user specifies? For example, I was testing two versions of a program, with output written to separate files. Running fc on them gave
***** temp_fta.txt
At the return from NEWUOA Number of function values = 56
Least value of F = 6.304346362673683D+03 The corresponding X is:
1.106950D-01 9.162819D-01
***** TEMP_FTU.TXT
At the return from NEWUOA Number of function values = 56
Least value of F = 6.304346362673669D+03 The corresponding X is:
1.106950D-01 9.162819D-01
The values of F in the two files are close enough. I guess you write a program in Fortran or another language to extract floating point values from lines of a text file (they could appear anywhere in the line) and check that they differ by less than a relative or absolute tolerance in order to compare files.
I’ve never used this, but numdiff might be useful. Installing on my Linux PC (Ubuntu) with sudo apt install numdiff, it gives
$ numdiff -e D -r 1.0e-15 temp_fta.txt temp_ftu.txt # or "-e dD" etc
----------------
##2 #:6 <== 6.304346362673683D+03
##2 #:6 ==> 6.304346362673669D+03
@ Absolute error = 1.4000000000e-11, Relative error = 2.2206901707e-15
+++ File "temp_fta.txt" differs from file "temp_ftu.txt"
$ numdiff -e D -r 1.0e-12 temp_fta.txt temp_ftu.txt
+++ Files "temp_fta.txt" and "temp_ftu.txt" are equal
Here -e, -r, -a specify the exponent letter(s), relative tolerance, and absolute tolerance (please see man numdiff or numdiff -h for options).
In my case, I often use meld to see the difference of numbers visually. It highlights the last few digits that differ in the corresponding numbers in two files compared.
There is a single-file Fortran file there called numdiff.f90 that is not related to the numdiff above (or the numdiff library) that assumes two files are basically identical accept for numeric values. It is also available as an fpm application and as part of GitHub - urbanjost/general-purpose-fortran: General Purpose Fortran Cooperative. The documentation is available via numdiff --help. It is sometimes called “nd” instead of “numdiff”.
That might be off topic, but in case you want to do a raw string comparison you could also compute one of the various string metrics. I have been using the Damerau-Levenshtein distance to compare molecule smiles in the past. Since it gives a score you can set a similarly threshold.
The only one I have in Fortran is that one. From M_strings
edit_distance.f90
pure elemental integer function edit_distance (a,b)
character(len=*), intent(in) :: a, b
integer :: len_a, len_b, i, j, cost
! matrix for calculating Levenshtein distance
!integer :: matrix(0:len_trim(a), 0:len_trim(b)) ! not supported by all compilers yet
integer,allocatable :: matrix(:,:)
len_a = len_trim(a)
len_b = len_trim(b)
!-------------------------------------- ! required by older compilers instead of above declaration
if(allocated(matrix))deallocate(matrix)
allocate(matrix(0:len_a,0:len_b))
!--------------------------------------
matrix(:,0) = [(i,i=0,len_a)]
matrix(0,:) = [(j,j=0,len_b)
do i = 1, len_a
do j = 1, len_b
cost=merge(0,1,a(i:i)==b(j:j))
matrix(i,j) = min(matrix(i-1,j)+1, matrix(i,j-1)+1, matrix(i-1,j-1)+cost)
enddo
enddo
edit_distance = matrix(len_a,len_b)
end function edit_distance
My version gets slow very quickly as the length of the string increases. It worked for what I needed.but I used it in parallel because I had some longer phrases to look at but at the time it was a one-time need for me. Is there sufficent need to warrant an fpm package for
these types of algorithms?
When not coding in Fortran, I usually develop in C# and I already used that library with which I got good results: String.Similarity. @urbanjost, in case you need something faster, that C# library claim to be the fastest levenshtein distance computation: Fastenshtein. If you want to start such project, I would gladly contribute.
I have used various forms of Beebe’s ndiff for many years. The file ndiff-2.00.zip has both C and awk code. I still have a version of ndiff.awk - dating from c2000 - in production in a couple of testsuites. It isn’t broken so …
I am intrigued by the algorithms partly because they are far-afield from what I have typically needed but I really only required something like that once. If there was a strong demand it would be interesting, particularly to see if we could create the fastest versions but I would have to pass on generating a collection of edit distance procedures at the moment.
Comparing floats comes up a lot though. There is the GNU-licensed module by Paul van Delst in GPF:
and some of the routine in numdiff/nd mentioned above. I am sure others have similar functions. It would really be nice to see the fpm repository come on line with some packages like that consolidated into repository packages.