Suggestion: FINDLOC tolerance

adenchfi · February 4, 2023, 4:03am

A simple suggestion. Include an optional tolerance parameter for FINDLOC, such that you can apply it to real arrays without worry of floating point behavior messing things up.

Expected behavior when multiple values fall within the specified tolerance can mimic the current behavior, and return the first such value found. The default tolerance can be either a set value, or depend on the kind of array being considered.

Beliavsky · February 4, 2023, 4:28am

You can get this behavior by applying FINDLOC to a logical array, as in the code

program main
implicit none
integer, parameter :: n = 5, niter = 5
integer :: iter
real :: x(n)
call random_seed()
do iter=1,niter
   call random_number(x)
   print "(i6,*(f6.3))",findloc(abs(x-0.5) < 0.1, .true.),x
end do
end program main

which gives sample output with gfortran of

     1 0.558 0.438 0.960 0.352 0.062
     0 0.988 0.075 0.225 0.815 0.110
     1 0.598 0.893 0.539 0.838 0.565
     3 0.220 0.134 0.447 0.212 0.813
     0 0.795 0.838 0.684 0.362 0.871

although I wonder if a compiler could optimize your proposed syntax better.

adenchfi · February 4, 2023, 4:37am

Indeed, the workaround is fine, but seems unnecessary. I make the suggestion since this behavior is needed frequently in my work, and I think (precision) tolerance parameters would be a wonderful intrinsic behavior for a scientific computing language such as Fortran; not necessarily just in FINDLOC, either. Of course, one can roll their own easily, but it serves to obfuscate code sometimes, and seems like a universal-enough idea.

FedericoPerini · February 4, 2023, 2:03pm

Finding the location of an item in an array and finding the item that’s closest in distance to a given real number look like pretty different operations to me.

That’s #1 rule working with real numbers: “always be wary equalling real numbers”

rwmsu · February 4, 2023, 3:14pm

I’ve always felt that Fortran needs an intrinsic “almost” equal function along the lines of the following.

  Elemental Function almostEqualR64(a, b, tol) Result(aeqb)
  
! Function to test if two REAL64 floating point numbers 
! almost equal to close to machine precision 

! original code taken from post on Intel Fortran Forum 11/15/2016

    Implicit NONE

! Argument variables

    Real(REAL64),           Intent(IN) :: a, b
    Real(REAL64), Optional, Intent(IN) :: tol
    Logical                            :: aeqb

    If (PRESENT(tol)) Then
      aeqb = ABS(a-b) < tol
    Else
      aeqb = ABS(a-b) < SPACING_FACTOR*SPACING(MAX(ABS(a), ABS(b)))
    End If

  End Function almostEqualR64

I normally set SPACING_FACTOR to 5._REAL64. I also have one for REAL32 and create two generic interfaces/operators (in a module)

 Interface OPERATOR(.AEQ.)
   Module Procedure almostEqualR32
   Module Procedure almostEqualR64
  End Interface

  Interface almostEqual
   Module Procedure almostEqualR32
   Module Procedure almostEqualR64
 End Interface

For the case where using the SPACING based tolerance is what you want, you can do a

if (a .AEQ. b) then ! do something

In a perfect world, I would like to be able to do something like
if (a ~= b) with a predefined tolerance set by the system

Edit. OOPS. Sorry but I got my code mixed up. almostEqual with the optional tol parameter will not work for the .AEQ. operator.

RonShepard · February 4, 2023, 5:41pm

If you write out the loop by hand, you will see that they are almost identical operations. You loop over the elements in array order, comparing the current value to a saved value. In one case, that saved value is an input, in the other case it is updated as the loop progresses. In both cases you can exit early for an exact match. Otherwise, the entire array must be searched.

One problem I have with the expression is that the code is written “as if” a temporary array of the absolute values is allocated and created, and then findloc() or minloc() operates on that temporary array. If a compiler did it that way, it would be very inefficient. Instead, one expects the compiler to optimize the whole expression together, eliminate the intermediate array allocation, and just loop over the array elements. So the problem is that the expression that the programmer writes is much different from the machine code that he expects to be generated. For this kind of problem, I sometimes write out the do loop explicitly, even though it is much more verbose that way than with the combination of intrinsic functions. That is a programming choice that I would prefer to avoid. I would prefer some kind of syntax that is concise and clear, yet is closer to the desired machine code than the current combination of intrinsic functions.

RonShepard · February 4, 2023, 5:49pm

This expression fails when both a and b are zero, right? That case probably needs to be tested separately, or the test changed to <=.

rwmsu · February 4, 2023, 6:33pm

@RonShepard, I think you might be right. For me comparing 0.0 = 0.0 would be a corner case. Still adding <= is probably a safer way to go. On my Linux systems with ifort SPACING(0.0_REAL64) is 2.225073858507201E-308. I’m not sure what the result of ABS(0.0-0.0) is (and if comparing literal constants versus variables makes a difference). I probably need to try to go back to the oriiginal Intel Forum post I got this from and see if they had a <=. My original implementation used EPSILON which is fixed by the precision. If I understand how SPACING works (and I’ll be the first to admit I probably don’t) it gives you the smallest number you can add to a given number without changing its original value (is that true?). Here are some values

  double precision spacing(0.0)    =   2.225073858507201E-308
  double precision spacing(1.0)    =   2.220446049250313E-016
  double precision spacing(10.0)   =   1.776356839400250E-015
  double precision spacing(100.0)  =   1.421085471520200E-014
  double precision spacing(1000.0) =   1.136868377216160E-013
  single precision spacing(0.0)    =   1.1754944E-38
  single precision spacing(1.0)    =   1.1920929E-07
  single precision spacing(10.0)   =   9.5367432E-07
  single precision spacing(100.0)  =   7.6293945E-06
  single precision spacing(1000.0) =   6.1035156E-05

rwmsu · February 4, 2023, 6:51pm

@RonShepard, A quick test shows that my original code returns a value of true when both a and b are 0 for both single and double precision (at least with ifort)

RonShepard · February 4, 2023, 8:03pm

Ah, yes, you are right. I was thinking wrongly about the result of spacing(0.0).

I have always used epsilon(), or its explicit value, for these kind of tests, and now I’m using spacing() more often. There are a few gotchas. The epsilon expression can sometimes be almost a factor of 2 larger than the spacing value. I’ve found a few cases where that matters, and I had to modify the test expression accordingly

As to what spacing() returns, try this code:

program xxx
   use, intrinsic :: iso_fortran_env, only: wp=>real64
   implicit none
   real(wp) :: a
   a = 1.0_wp
   write(*,*) spacing(nearest(a,-1.0_wp))
   write(*,*) spacing(a)
   write(*,*) spacing(nearest(a,+1.0_wp))
   a = 1.5_wp
   write(*,*) spacing(nearest(a,-1.0_wp))
   write(*,*) spacing(a)
   write(*,*) spacing(nearest(a,+1.0_wp))
end program xxx

On my computer, it gives:

$ gfortran xxx.F90 && a.out
   1.1102230246251565E-016
   2.2204460492503131E-016
   2.2204460492503131E-016
   2.2204460492503131E-016
   2.2204460492503131E-016
   2.2204460492503131E-016

Note that the first three numbers are right at the boundary were the fp exponent changes, while the last three numbers are in the middle of the range where the exponent is all the same. That is why there is exactly a factor of 2 difference in the first two printed values. If I had used multiplication with epsilon() instead of spacing(), the output would have been different. Basically spacing() always returns a value with all zeros in the mantissa except for the leading hidden bit, while multiplication by epsilon() results in a value that has the same mantissa bits as the multiplicand, but with a shifted exponent.

FortranFan · February 4, 2023, 8:12pm

@rwmsu,

You may want to note a test making use of EPSILON or some variant thereof is indeed what can be seen as “good enough” equal or your “almost equal” (a ~= b).

What you are doing with SPACING when the optional argument tol is absent is more like your own interpretation of what is truly equal when it comes to floating-point values.

In that sense, your function is inconsistent: when the optional tol is supplied, the function behaves like “almost equal”, otherwise it is as exactly equal as numerically possible given the a, b values. The latter is more like what a compiler implementation might do with the == on floating-point types.

Have you tested your function relative to what your processor does with standard == and come up with any cases when your interpretation using SPACING_FACTOR*SPACING(..) has served you better?

rwmsu · February 4, 2023, 9:17pm

@FortranFan,

My interpretation of SPACING is as follows. For a given floating point number,

A, SPACING(A) defines the “radius” of a circle centered at A. Anything inside that radius is taken to be A. This defines a “machine zero” (being any number added to a number that doesn’t change the original number) that is relative to the magnitude of the number itself and not an absolute smallest number (EPSILON) you can have for a given precision. At the end of the day, which one you use (SPACING or EPSILON) depends on just how accurate you need the resulting numbers to be. There are probably cases where one would be preferred over the other. In reality, most use cases where you need a tolerance you are better off selecting the largest one you can get by with that gives you the desired accuracy. An example would be doing a n
Newton interation where you only need something accurate to 1E-6. Setting a stopping criterion of 1.E-16 would just lead to lot of useless iterations in pursuit of an accuracy that is meaningless for real world engineering calculations.

Edit

Also regarding ==, it must be a generational thing but I was taught (in my first Fortran class 50+ years ago) you NEVER do A.EQ.B for any floating point numbers. Maybe as you suggest, modern processors can handle this, but old habits die hard

FortranFan · February 4, 2023, 9:43pm

You can then consider sqrt(EPSILON(..)) which can be a named constant in a Fortran program/module.

Using SPACING places you far closer to exactly equal than “almost equal” and appears akin to what a modern processor might do with == since the standard is silent on what “equal to” means with different types and especially floating point ones.

Nonetheless, I don’t suggest you switch to == with floating-point types, it’s indeed preferable to employ something explicit but which has a better suitable default when an optional tol is absent.

Note the analogy is better as a number line rather than a circle, you’re dealing in 1-D with A.

RonShepard · February 5, 2023, 3:08am

This is a useful way to look at floating point values in many situations. In other situations, it is best to think of them as a set of rational numbers that can be represented in the computer. Specifically, for a fixed exponent, there are 2**N equally spaced values where N is the number of mantissa bits. For any one of those numbers spacing(x) will return the numerical difference between two of those values. There are no representable numbers in between those values.

If you look at the little program I posted previously, you will see what happens when spacing() is applied to numbers at the boundary were the exponent changes. On one side of the boundary (the one toward zero), you get a small spacing, while on the other side of the boundary you get a spacing that is twice as large. When the number is right at the boundary, the distance to the smaller number is half the distance to the next larger number. The spacing() function returns the larger value in that case. There are 2**52 numbers with the same exponent, and the spacing() function returns the same value for each of them. In contrast, epsilon(x)*x is a different value for each value of x; the mantissa bits are the same as x, but the exponent is shifted.

PierU · February 7, 2023, 2:46pm

To me, a default tolerance would be very difficult to decide, because there are too many different usage cases. The default tolerance should just be 0.0 in my opinion (and anyway, any non-zero default tolerance would break existing code).

Or it could be the closest value among the ones that fall within the specified tolerance. Both approaches make sense depending on the cases.

adenchfi · February 7, 2023, 7:56pm

Yes, I agree with both of your points.

Topic		Replies	Views
Finding out default kind values Help	5	794	April 24, 2022
New GitHub repo Open Catalog of Best Practices for Modernization and Performance Announcements	5	461	April 1, 2024
Problems with FINDLOC command Help	5	1054	January 24, 2022
FINDLOC alternative Help	10	1780	October 1, 2021
Implied-do array constructor, type specs, and differences between GFortran, Intel, and LFortran Help	49	1296	May 26, 2024

Suggestion: FINDLOC tolerance

Related topics