What to do about this: ACOS on 32Bit gives 1.490116E-008 and on 64Bit gives 0.000000

giraffe · August 31, 2022, 4:10pm

When I run the below program on a 32 bit computer, the number 1.4901161193847656E-008 is output.
When I run the below program on a 64 bit computer, the number 0.0000000000000000 is output.

Questions: As a professional, how do I get this program to output the 32 bit number on a 64 bit computer? What can I do to anything in this program to do what I have to do?

How close can you get me?

  PROGRAM TEST
    IMPLICIT NONE
    INTEGER, PARAMETER :: rk = selected_real_kind(15)
    REAL*8 ANS,IN1(3),IN2(3)
    ... 
     WRITE(*,*)'ACOS(ANS)= ',ACOS(ANS)
   END PROGRAM TEST

gfortran -g test.f -lgfortran
./a.out

Output on 64 Bit Computer:
ACOS(ANS)= 0.0000000000000000

Output on 32 Bit Computer:
ACOS(ANS)= 1.4901161193847656E-008

Thank you,

PierU · August 31, 2022, 4:35pm

What do you mean by “a 32 bit computer”? Is it a CPU that has only 32 bits floating point registers?

And why do you absolutely want/need to get the same results on both machines? Because of roundoffs, the results of floating point operations are not garanteed to be perfectly identical on different hardwares (they are not even garanteed to be perfectly identical between 2 different compilers on the same hardware or with the same compiler and different optimization levels at compilation)

PierU · August 31, 2022, 4:48pm

It’s not that simple. x86 intel CPUs were 32 bits but had 80 bits floating point registers.

OK, but it’s a tautology . As I wrote, floating point calculations are by nature approximate and can differ depending on the hardware/compiler/compilation options… I mean, this is someting that is expected, and having slightly different results does NOT mean that there is a problem with the code.

RonShepard · August 31, 2022, 4:58pm

If you do the computation in higher precision, you will see that the zero result is correct. The numerator and denominator are both 1.0 to the accuracy of rk, so ANS is 1.0, and ACOS(ANS) should be zero.

So your problem is try try to mimic the lower precision results in order to get the wrong answer in order to match your previous regression tests. Trying to purposely get a wrong answer can be a frustrating exercise.

I have modified your code slightly to show the above in more detail.

Program TEST
   implicit none
   integer, parameter :: rk = selected_real_kind(33)
   REAL(rk) :: ANS, IN1(3), IN2(3)
   real(rk) :: num, denom

   IN1(1:3) = [0.99999999999999944_rk,  &
           &   0.0000000000000015520904252930757_rk, &
        &     -0.000000035507690012306962_rk]

   IN2(1:3) = [0.99999999999999911_rk, &
        &      0.0000000000000019106854659539682_rk, &
        &     -0.000000043711388286737889_rk]

   num = (IN1(1)*IN2(1)+(IN1(2)*IN2(2)+IN1(3)*IN2(3)))
   denom = SQRT(IN1(1)**2+(IN1(2)**2+IN1(3)**2))
   ANS= num / denom
   write(*,*) 'num=', num, ' denom=', denom, 'ANS=', ANS

   ANS = max( -1.0_rk, min( ANS, 1.0_rk) )  !clip to legal range.                           
   write(*,*)'ACOS(ANS)= ',ACOS(ANS)
end Program TEST

You will notice that I put parentheses inside the computation of num and denom. When you add a series of positive numbers together, it is most accurate to add them from smallest to highest. That did not make a difference in these results, but I left the change in the code just to see how it works.

As for how exactly to get the lower precision results in this case, you might try to change all of the real numbers to 32-bit rather than 64-bit, and do the arithmetic entirely in 32 bit floating point. That is simple enough to try, and if it doesn’t work, then you would need to track down exactly where the errors in the 32 bit calculation occur, and try to force the computation to reproduce those errors. As I said, chasing your tail like this can be a frustrating exercise. The compiler will likely fight you along the way because it will instead be trying to produce correct results.

PierU · August 31, 2022, 5:05pm

@RonShepard I assumed (maybe wrongly) that he was using exactly the same code on the 32 bits machine (i.e. working with double precision variables). If it’s not the case there are even more reasons to have different results!

RonShepard · August 31, 2022, 5:16pm

In that case, then the old result is probably just in error because of some math library error. It will be difficult to track down, even if you have the source code to the math library.

Changing rk in the above code to the default real kind still produces 1.0 for ANS and 0.0 for the ACOS() on my computer, so that isn’t much help.

Did the old 32-bit results change with compiler optimization level or with other compiler options (e.g. fastmath).

oscardssmith · August 31, 2022, 5:36pm

The problem you are running into here is that the Fortran and C standards don’t require the languages to do the math you wrote. They allow the compiler to use higher precision or different orders of arithmetic than in your code. This means that results can change every time you recompile your program. If you want a language that does the math you wrote, C and Fortran aren’t for you.

PierU · August 31, 2022, 6:21pm

Not necessarily. But since we don’t know what was the old 32 bits hardware it’s difficult to comment…

My reasonning is that you HAVE to accept some small differences on floating point operations. Your regression tests have to take them into account: abs(old_result - new_result) <= tolerance (and you have to set a tolerance). Adding a constant is not a proper solution.

RonShepard · August 31, 2022, 6:35pm

In this case, we know that the correct result is 0.0. We know that because when the calculation is done with 33 bit precision, that is what you get. So the question is how to satisfy the regression tests comparing the new correct results with the old incorrect results. One way is comparing floating point results with a tolerance. The other way is to modify the base results to be correct, and then to compare the computed results to those correct results.

The errors in the old code are many orders of magnitude larger than the machine epsilon, so I don’t think the differences are just a matter of roundoff errors accumulating. That is why I was previously saying that the error must be in a math library operation or something of that magnitude. One would need the source code to the old math library to try to track it down.

everythingfunctional · August 31, 2022, 6:39pm

(Assuming the above statement is actually true) My guess is that the 32-bit machine is (must) be doing software emulation of the 64-bit floating point math. That can lead to accumulation of various round-off “errors” that aren’t encountered by the 64-bit version, as it can perform the operations directly on the hardware. Thus, you get “different” answers. I put those terms in quotes because, as others have already pointed out, floating point math is always an approximation.

As the 32-bit answer is the “wrong” one, I would highly recommend not changing the code and accepting the small “differences” after migrating to the 64-bit system. Otherwise you are intentionally trying to get a specific wrong answer just because it happens to be the “original”. That’s going to be hard and there really isn’t a “right” way to do it.

Harper · August 31, 2022, 10:17pm

giraffe’s code won’t compile as given. It ought to contain a statement evaluating ANS where its ... is now. And it carefully defines rk and then doesn’t use it. I have only a 64 bit computer (x86_64), on which I got giraffe’s 32-bit result by replacing
...
by
ANS = 1-epsilon(ANS)/2

PierU · August 31, 2022, 10:30pm

He has deleted a part of the code that was originally posted.

mecej4 · September 1, 2022, 8:38am

It is not reasonable for OP to ask for advice on code that is represented cryptically as “…”.

If that “…” stands for a statement that assigns a constant value to ANS, such as Harper’s expression, the value to be printed may well be computed at compile time. If so, the code that is generated will contain no call to, or inline code for, the ACOS function. Instead, the compiler may simply place the value of the result in the I/O list of the WRITE statement, and it will not matter whether the EXE/a.out is 32-bit or 64-bit, uses X87 or SSE2 instructions, which OS is being used, etc. Different compilers may produce slightly different results, depending on what their compile time library functions do.

Another point worth noting is that, when it comes to floating point instructions, be they x87 or SSEn or AVXn, the “bitness” of the X86/X64 CPU (32 or 64 bit) has no effect on the results of floating point calculations. Even on the original IBM PC with its 8/16 bit Intel 8088 CPU and 16-bit MSDOS, the 8087 coprocessor used floating point numbers that were 32/64/80 bits in size.

RonShepard · September 1, 2022, 8:33pm

Now I see the issue. ACOS(1+x) for small x looks like ~sqrt(2x). With the above substitution, the error is proportional to sqrt(epsilon) not to epsilon itself. That is why the error is so big. I was mistakenly thinking it had to be a math library problem, like sneaking in a 32-bit value somewhere rather than a 64-bit value. But instead, the error is big just because of the way the mathematical function behaves near 1.

Harper · September 1, 2022, 9:58pm

When I was checking giraffe’s problem I was at first puzzled about why I needed epsilon/2, not epsilon. In case anyone else is similarly puzzled, it’s an inescapable feature of floating-point arithmetic, which the following little program may reveal. Its output was the same on x86_64 Linux systems with all three of gfortran, g95 and ifort.

! Nearest numbers to 1d0 up and down
  print "(A,SP,ES24.16)",'nearest(1d0,+1d0)-1 =',nearest(1d0,+1d0)-1
  print "(A,SP,ES24.16)",'       epsilon(1d0) =',epsilon(1d0)
  print "(A,SP,ES24.16)",'nearest(1d0,-1d0)-1 =',nearest(1d0,-1d0)-1

The output:

nearest(1d0,+1d0)-1 = +2.2204460492503131E-16
       epsilon(1d0) = +2.2204460492503131E-16
nearest(1d0,-1d0)-1 = -1.1102230246251565E-16

RonShepard · September 1, 2022, 10:09pm

The number is right at the boundary where the exponent changes, so the spacing of numbers on the high side is twice the spacing of numbers on the low side.

This is for a base 2 exponent. If it were a base 16 exponent, like the IBM floating point format, then the spacing is 16 times larger on the high side than on the low side.

Topic		Replies	Views
Wrong (strange outputs) of real numbers Help	27	1506	April 26, 2023
Odd gfortran result with atan	32	1051	June 3, 2022
Question about double precision on ARM processors	52	1916	November 19, 2023
Same number, different results Intel	9	774	December 16, 2024
Fun with 40 year old code	10	1642	June 18, 2022

What to do about this: ACOS on 32Bit gives 1.490116E-008 and on 64Bit gives 0.000000

Related topics