Comparing files with floating point values with a tolerance

Beliavsky · August 25, 2024, 9:49pm

Are there “smart” file comparison tools that will say that two corresponding lines of output files
are within a tolerance that the user specifies? For example, I was testing two versions of a program, with output written to separate files. Running fc on them gave

***** temp_fta.txt
At the return from NEWUOA Number of function values = 56
Least value of F = 6.304346362673683D+03 The corresponding X is:
1.106950D-01 9.162819D-01

***** TEMP_FTU.TXT
At the return from NEWUOA Number of function values = 56
Least value of F = 6.304346362673669D+03 The corresponding X is:
1.106950D-01 9.162819D-01

The values of F in the two files are close enough. I guess you write a program in Fortran or another language to extract floating point values from lines of a text file (they could appear anywhere in the line) and check that they differ by less than a relative or absolute tolerance in order to compare files.

RonShepard · August 25, 2024, 11:55pm

A google search found this tool: https://www.math.utah.edu/~beebe/software/ndiff/

septc · August 26, 2024, 12:00am

I’ve never used this, but numdiff might be useful. Installing on my Linux PC (Ubuntu) with sudo apt install numdiff, it gives

$ numdiff -e D -r 1.0e-15 temp_fta.txt temp_ftu.txt   # or "-e dD" etc
----------------
##2       #:6   <== 6.304346362673683D+03
##2       #:6   ==> 6.304346362673669D+03
@ Absolute error = 1.4000000000e-11, Relative error = 2.2206901707e-15

+++  File "temp_fta.txt" differs from file "temp_ftu.txt"

$ numdiff -e D -r 1.0e-12 temp_fta.txt temp_ftu.txt 

+++  Files "temp_fta.txt" and "temp_ftu.txt" are equal

Here -e, -r, -a specify the exponent letter(s), relative tolerance, and absolute tolerance (please see man numdiff or numdiff -h for options).

In my case, I often use meld to see the difference of numbers visually. It highlights the last few digits that differ in the corresponding numbers in two files compared.

https://meldmerge.org/

certik · August 26, 2024, 3:36am

Yes, I had to handle this a few times and I usually wrote a Python script to do the comparison.

urbanjost · August 26, 2024, 6:27am

There is a single-file Fortran file there called numdiff.f90 that is not related to the numdiff above (or the numdiff library) that assumes two files are basically identical accept for numeric values. It is also available as an fpm application and as part of GitHub - urbanjost/general-purpose-fortran: General Purpose Fortran Cooperative. The documentation is available via numdiff --help. It is sometimes called “nd” instead of “numdiff”.

The man-page is also on-line as nd

davidpfister · August 26, 2024, 10:41am

That might be off topic, but in case you want to do a raw string comparison you could also compute one of the various string metrics. I have been using the Damerau-Levenshtein distance to compare molecule smiles in the past. Since it gives a score you can set a similarly threshold.

urbanjost · August 26, 2024, 12:13pm

The only one I have in Fortran is that one. From M_strings

edit_distance.f90

pure elemental integer function edit_distance (a,b)
character(len=*), intent(in) :: a, b
integer                      :: len_a, len_b, i, j, cost
! matrix for calculating Levenshtein distance
!integer                      :: matrix(0:len_trim(a), 0:len_trim(b)) ! not supported by all compilers yet
integer,allocatable          :: matrix(:,:)
   len_a = len_trim(a)
   len_b = len_trim(b)
   !-------------------------------------- ! required by older compilers instead of above declaration
   if(allocated(matrix))deallocate(matrix)
   allocate(matrix(0:len_a,0:len_b))
   !--------------------------------------
   matrix(:,0) = [(i,i=0,len_a)]
   matrix(0,:) = [(j,j=0,len_b)
   do i = 1, len_a
      do j = 1, len_b
         cost=merge(0,1,a(i:i)==b(j:j))
         matrix(i,j) = min(matrix(i-1,j)+1, matrix(i,j-1)+1, matrix(i-1,j-1)+cost)
      enddo
   enddo
   edit_distance = matrix(len_a,len_b)
end function edit_distance

My version gets slow very quickly as the length of the string increases. It worked for what I needed.but I used it in parallel because I had some longer phrases to look at but at the time it was a one-time need for me. Is there sufficent need to warrant an fpm package for
these types of algorithms?

davidpfister · August 26, 2024, 1:25pm

When not coding in Fortran, I usually develop in C# and I already used that library with which I got good results: String.Similarity.
@urbanjost, in case you need something faster, that C# library claim to be the fastest levenshtein distance computation: Fastenshtein. If you want to start such project, I would gladly contribute.

DavidB · August 27, 2024, 12:53am

I have used various forms of Beebe’s ndiff for many years. The file ndiff-2.00.zip has both C and awk code. I still have a version of ndiff.awk - dating from c2000 - in production in a couple of testsuites. It isn’t broken so …

DavidB · August 27, 2024, 1:11am

There is also a perl script nrdiff.pl originally distributed by Numerical Recipes.

urbanjost · August 27, 2024, 5:54am

I am intrigued by the algorithms partly because they are far-afield from what I have typically needed but I really only required something like that once. If there was a strong demand it would be interesting, particularly to see if we could create the fastest versions but I would have to pass on generating a collection of edit distance procedures at the moment.

Comparing floats comes up a lot though. There is the GNU-licensed module by Paul van Delst in GPF:

general-purpose-fortran/src/M_Compare_Float_Numbers.f90 at master · urbanjost/general-purpose-fortran · GitHub

and some tests in the unit testing modules in

https://github.com/urbanjost/M_frameworknd

and some of the routine in numdiff/nd mentioned above. I am sure others have similar functions. It would really be nice to see the fpm repository come on line with some packages like that consolidated into repository packages.

Topic		Replies	Views
Semantics based diff of Fortran source files	8	671	July 3, 2022
Suggestion: FINDLOC tolerance Language enhancement	15	856	February 7, 2023
Looking at some old code	27	1164	October 24, 2022
Fix comparing float number with integer issue	17	1686	July 3, 2022
Small Fortran tools	26	1477	May 16, 2025

Comparing files with floating point values with a tolerance

Related topics