When I run the below program on a 32 bit computer, the number 1.4901161193847656E-008 is output.
When I run the below program on a 64 bit computer, the number 0.0000000000000000 is output.
Questions: As a professional, how do I get this program to output the 32 bit number on a 64 bit computer? What can I do to anything in this program to do what I have to do?
How close can you get me?
PROGRAM TEST
IMPLICIT NONE
INTEGER, PARAMETER :: rk = selected_real_kind(15)
REAL*8 ANS,IN1(3),IN2(3)
...
WRITE(*,*)'ACOS(ANS)= ',ACOS(ANS)
END PROGRAM TEST
gfortran -g test.f -lgfortran
./a.out
Output on 64 Bit Computer:
ACOS(ANS)= 0.0000000000000000
Output on 32 Bit Computer:
ACOS(ANS)= 1.4901161193847656E-008
What do you mean by “a 32 bit computer”? Is it a CPU that has only 32 bits floating point registers?
And why do you absolutely want/need to get the same results on both machines? Because of roundoffs, the results of floating point operations are not garanteed to be perfectly identical on different hardwares (they are not even garanteed to be perfectly identical between 2 different compilers on the same hardware or with the same compiler and different optimization levels at compilation)
It’s not that simple. x86 intel CPUs were 32 bits but had 80 bits floating point registers.
OK, but it’s a tautology . As I wrote, floating point calculations are by nature approximate and can differ depending on the hardware/compiler/compilation options… I mean, this is someting that is expected, and having slightly different results does NOT mean that there is a problem with the code.
If you do the computation in higher precision, you will see that the zero result is correct. The numerator and denominator are both 1.0 to the accuracy of rk, so ANS is 1.0, and ACOS(ANS) should be zero.
So your problem is try try to mimic the lower precision results in order to get the wrong answer in order to match your previous regression tests. Trying to purposely get a wrong answer can be a frustrating exercise.
I have modified your code slightly to show the above in more detail.
Program TEST
implicit none
integer, parameter :: rk = selected_real_kind(33)
REAL(rk) :: ANS, IN1(3), IN2(3)
real(rk) :: num, denom
IN1(1:3) = [0.99999999999999944_rk, &
& 0.0000000000000015520904252930757_rk, &
& -0.000000035507690012306962_rk]
IN2(1:3) = [0.99999999999999911_rk, &
& 0.0000000000000019106854659539682_rk, &
& -0.000000043711388286737889_rk]
num = (IN1(1)*IN2(1)+(IN1(2)*IN2(2)+IN1(3)*IN2(3)))
denom = SQRT(IN1(1)**2+(IN1(2)**2+IN1(3)**2))
ANS= num / denom
write(*,*) 'num=', num, ' denom=', denom, 'ANS=', ANS
ANS = max( -1.0_rk, min( ANS, 1.0_rk) ) !clip to legal range.
write(*,*)'ACOS(ANS)= ',ACOS(ANS)
end Program TEST
You will notice that I put parentheses inside the computation of num and denom. When you add a series of positive numbers together, it is most accurate to add them from smallest to highest. That did not make a difference in these results, but I left the change in the code just to see how it works.
As for how exactly to get the lower precision results in this case, you might try to change all of the real numbers to 32-bit rather than 64-bit, and do the arithmetic entirely in 32 bit floating point. That is simple enough to try, and if it doesn’t work, then you would need to track down exactly where the errors in the 32 bit calculation occur, and try to force the computation to reproduce those errors. As I said, chasing your tail like this can be a frustrating exercise. The compiler will likely fight you along the way because it will instead be trying to produce correct results.
@RonShepard I assumed (maybe wrongly) that he was using exactly the same code on the 32 bits machine (i.e. working with double precision variables). If it’s not the case there are even more reasons to have different results!
In that case, then the old result is probably just in error because of some math library error. It will be difficult to track down, even if you have the source code to the math library.
Changing rk in the above code to the default real kind still produces 1.0 for ANS and 0.0 for the ACOS() on my computer, so that isn’t much help.
Did the old 32-bit results change with compiler optimization level or with other compiler options (e.g. fastmath).
The problem you are running into here is that the Fortran and C standards don’t require the languages to do the math you wrote. They allow the compiler to use higher precision or different orders of arithmetic than in your code. This means that results can change every time you recompile your program. If you want a language that does the math you wrote, C and Fortran aren’t for you.
Not necessarily. But since we don’t know what was the old 32 bits hardware it’s difficult to comment…
My reasonning is that you HAVE to accept some small differences on floating point operations. Your regression tests have to take them into account: abs(old_result - new_result) <= tolerance (and you have to set a tolerance). Adding a constant is not a proper solution.
In this case, we know that the correct result is 0.0. We know that because when the calculation is done with 33 bit precision, that is what you get. So the question is how to satisfy the regression tests comparing the new correct results with the old incorrect results. One way is comparing floating point results with a tolerance. The other way is to modify the base results to be correct, and then to compare the computed results to those correct results.
The errors in the old code are many orders of magnitude larger than the machine epsilon, so I don’t think the differences are just a matter of roundoff errors accumulating. That is why I was previously saying that the error must be in a math library operation or something of that magnitude. One would need the source code to the old math library to try to track it down.
(Assuming the above statement is actually true) My guess is that the 32-bit machine is (must) be doing software emulation of the 64-bit floating point math. That can lead to accumulation of various round-off “errors” that aren’t encountered by the 64-bit version, as it can perform the operations directly on the hardware. Thus, you get “different” answers. I put those terms in quotes because, as others have already pointed out, floating point math is always an approximation.
As the 32-bit answer is the “wrong” one, I would highly recommend not changing the code and accepting the small “differences” after migrating to the 64-bit system. Otherwise you are intentionally trying to get a specific wrong answer just because it happens to be the “original”. That’s going to be hard and there really isn’t a “right” way to do it.
giraffe’s code won’t compile as given. It ought to contain a statement evaluating ANS where its ... is now. And it carefully defines rk and then doesn’t use it. I have only a 64 bit computer (x86_64), on which I got giraffe’s 32-bit result by replacing ...
by ANS = 1-epsilon(ANS)/2
It is not reasonable for OP to ask for advice on code that is represented cryptically as “…”.
If that “…” stands for a statement that assigns a constant value to ANS, such as Harper’s expression, the value to be printed may well be computed at compile time. If so, the code that is generated will contain no call to, or inline code for, the ACOS function. Instead, the compiler may simply place the value of the result in the I/O list of the WRITE statement, and it will not matter whether the EXE/a.out is 32-bit or 64-bit, uses X87 or SSE2 instructions, which OS is being used, etc. Different compilers may produce slightly different results, depending on what their compile time library functions do.
Another point worth noting is that, when it comes to floating point instructions, be they x87 or SSEn or AVXn, the “bitness” of the X86/X64 CPU (32 or 64 bit) has no effect on the results of floating point calculations. Even on the original IBM PC with its 8/16 bit Intel 8088 CPU and 16-bit MSDOS, the 8087 coprocessor used floating point numbers that were 32/64/80 bits in size.
Now I see the issue. ACOS(1+x) for small x looks like ~sqrt(2x). With the above substitution, the error is proportional to sqrt(epsilon) not to epsilon itself. That is why the error is so big. I was mistakenly thinking it had to be a math library problem, like sneaking in a 32-bit value somewhere rather than a 64-bit value. But instead, the error is big just because of the way the mathematical function behaves near 1.
When I was checking giraffe’s problem I was at first puzzled about why I needed epsilon/2, not epsilon. In case anyone else is similarly puzzled, it’s an inescapable feature of floating-point arithmetic, which the following little program may reveal. Its output was the same on x86_64 Linux systems with all three of gfortran, g95 and ifort.
! Nearest numbers to 1d0 up and down
print "(A,SP,ES24.16)",'nearest(1d0,+1d0)-1 =',nearest(1d0,+1d0)-1
print "(A,SP,ES24.16)",' epsilon(1d0) =',epsilon(1d0)
print "(A,SP,ES24.16)",'nearest(1d0,-1d0)-1 =',nearest(1d0,-1d0)-1
The number is right at the boundary where the exponent changes, so the spacing of numbers on the high side is twice the spacing of numbers on the low side.
This is for a base 2 exponent. If it were a base 16 exponent, like the IBM floating point format, then the spacing is 16 times larger on the high side than on the low side.