Ifort (IFORT) 2021.8.0: `1.0E+37 / 1.0E+38 = 0`

zaikunzhang · December 23, 2022, 12:59am

This is related to a the interesting behavior of ifort, but the result is even stranger. The same question is posted on Intel Community.

Should this again be explained as “for floating points to work as expected, use fp-model=strict”? (fp-model=strict does solve the problem.)

N.B.: Only the latest ifort, namely Ifort (IFORT) 2021.8.0, behaves in this way. ifort 2021.7.1 or ifx does not. It is confirmed on both Ubuntu 22.04 and Darwin 21.6.0.

Code (simplified; the original more complicated version is available on github; see my post below for a even more simplified version):

program test

implicit none

real :: S(7, 7)

S = 0.0
S(1, 7) = 1.0E+37
S(7, 1) = S(1, 7)

write (*, *) 'Before: S(1, 7), S(7, 1)', S(1, 7), S(7, 1)

! Is S symmetric?
if (all(abs(S - transpose(S)) <= 0)) then
    write (*, *) 'S is symmetric.'
else
    error stop 'S is not symmetric, which is wrong'
end if

S = S / 1.0E+38

!write (*, *) S
write (*, *) 'After: S(1, 7), S(7, 1)', S(1, 7), S(7, 1)

! Is S symmetric?
if (all(abs(S - transpose(S)) <= 0)) then
    write (*, *) 'S is symmetric.'
else
    error stop 'S is not symmetric, which is wrong'
end if

end program test

Result:

$ ifort --version && ifort test_intel_sym.f90 && ./a.out  
ifort (IFORT) 2021.8.0 20221119
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

 Before: S(1, 7), S(7, 1)  9.9999999E+36  9.9999999E+36
 S is symmetric.
 After: S(1, 7), S(7, 1)  0.0000000E+00  0.1000000 
 S is not symmetric, which is wrong

What is happening? Thanks.

urbanjost · December 23, 2022, 3:24am

Thought I saw what was happening, and wrote out a more verbose output version and was going to explain why changing the range you used for normalized values was inv9olved and ended up creating a version that appears not to work the way I expected with gfortran-9; and latest versions of ifort and ifx; although gfortran-10 comes pretty close at low optimizations (-Ofast induces same problem as ifort and ifx with default (but high) optimization). So I am interested in an explanation myself, although if you turn off optimization and use 1.0e16 to normalize it starts looking as most people would expect, so I might be close but not at all sure now what is going on either. You might find this one amusing with gfortran/ifort/ifx:

Source for variant

program test
use, intrinsic :: iso_fortran_env, only : sp=>real32, dp=>real64
implicit none
!integer,parameter :: wp=sp
integer,parameter :: wp=sp
! Why these strange numbers? Because they come from a real project.
real(kind=wp), parameter :: A(*) = &
[-1.2450638E+36_wp,  3.2670646E+36_wp, -1.9066904E+36_wp,  6.2253192E+35_wp,  2.4581355E+36_wp, &
  6.2253192E+35_wp, -1.7633055E+37_wp,  3.2670646E+36_wp, -2.2380216E+36_wp,  4.7253489E+36_wp, &
  3.2670646E+36_wp,  5.1026695E+36_wp,  3.2670646E+36_wp, -1.4988522E+37_wp, -1.9066904E+36_wp, &
  4.7253489E+36_wp,  2.2412790E+36_wp, -1.9066904E+36_wp, -3.7422960E+36_wp, -1.9066904E+36_wp, &
  1.6348895E+37_wp,  6.2253192E+35_wp,  3.2670646E+36_wp, -1.9066904E+36_wp, -1.2450638E+36_wp, &
  2.4581355E+36_wp,  6.2253192E+35_wp, -1.7633055E+37_wp,  2.4581355E+36_wp,  5.1026695E+36_wp, &
 -3.7422960E+36_wp,  2.4581355E+36_wp, -2.7723962E+36_wp,  2.4581355E+36_wp, -1.5797452E+37_wp, &
  6.2253192E+35_wp,  3.2670646E+36_wp, -1.9066904E+36_wp,  6.2253192E+35_wp,  2.4581355E+36_wp, &
 -1.2450638E+36_wp, -1.7633055E+37_wp, -1.7633055E+37_wp, -1.4988522E+37_wp,  1.6348895E+37_wp, &
 -1.7633055E+37_wp, -1.5797452E+37_wp, -1.7633055E+37_wp,  4.8741171E+37_wp]
real(kind=wp),dimension(7,7) :: S,T,U,V
integer :: i
character(len=*),parameter :: g= '(A)' ! '(AT)'
real(kind=wp),parameter :: big=1.0e38_wp  
!real(kind=wp),parameter :: big=1.0e28_wp  ! for 64 bit
!real(kind=wp),parameter :: big=1.0e16_wp  ! for 32 bit
S = reshape(A, [7, 7])
write(*,g) merge('S is symmetric.          ', '<ERROR>S is not symmetric', all(abs(S - transpose(S)) <= 0))
T = S / BIG
write(*,g) merge('T is symmetric.          ', '<ERROR>T is not symmetric', all(abs(S - transpose(S)) <= 0))
U=(T*BIG)
V=S-U
write (*,'("S")')
write (*, '(7(e13.5,1x))')( S(i,:),i=1,7)
write (*,'("T")')
write (*, '(7(e13.5,1x))')( T(i,:),i=1,7)
write (*,'("U")')
write (*, '(7(e13.5,1x))')( U(i,:),i=1,7)
write (*,'("V")')
write (*, '(7(e13.5,1x))')( V(i,:),i=1,7)
write(*,'(e13.5)')huge(0.0_wp), tiny(0.0_wp)
write (*, '(7(g0,1x))') (S.eq.U)
end program test

So basically, given array S divide it by constant BIG and create array T. Now multiply T by BIG and create array U and then compare S and U. Pretty simple. With floating point math the result can be surprising if you were expecting what they teach in Algebra class; but more than that is going on here; as even if I use BIG=1.0e16 with ifx I get weirdness. If yours works with ifx, could you try my variant and show what ifx produces? Does you code work on your machine with -O0?

zaikunzhang · December 23, 2022, 3:37am

Hi @urbanjost ! Thank you for checking this. Here is ifx on my machine with your code:

$ ifx --version && ifx test.f90 && ./a.out 
ifx (IFORT) 2023.0.0 20221201
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.

S is symmetric.          
T is symmetric.          
S
 -0.12451E+37   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37   0.62253E+36  -0.17633E+38
  0.32671E+37  -0.22380E+37   0.47253E+37   0.32671E+37   0.51027E+37   0.32671E+37  -0.14989E+38
 -0.19067E+37   0.47253E+37   0.22413E+37  -0.19067E+37  -0.37423E+37  -0.19067E+37   0.16349E+38
  0.62253E+36   0.32671E+37  -0.19067E+37  -0.12451E+37   0.24581E+37   0.62253E+36  -0.17633E+38
  0.24581E+37   0.51027E+37  -0.37423E+37   0.24581E+37  -0.27724E+37   0.24581E+37  -0.15797E+38
  0.62253E+36   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37  -0.12451E+37  -0.17633E+38
 -0.17633E+38  -0.14989E+38   0.16349E+38  -0.17633E+38  -0.15797E+38  -0.17633E+38   0.48741E+38
T
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
U
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
V
 -0.12451E+37   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37   0.62253E+36  -0.17633E+38
  0.32671E+37  -0.22380E+37   0.47253E+37   0.32671E+37   0.51027E+37   0.32671E+37  -0.14989E+38
 -0.19067E+37   0.47253E+37   0.22413E+37  -0.19067E+37  -0.37423E+37  -0.19067E+37   0.16349E+38
  0.62253E+36   0.32671E+37  -0.19067E+37  -0.12451E+37   0.24581E+37   0.62253E+36  -0.17633E+38
  0.24581E+37   0.51027E+37  -0.37423E+37   0.24581E+37  -0.27724E+37   0.24581E+37  -0.15797E+38
  0.62253E+36   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37  -0.12451E+37  -0.17633E+38
 -0.17633E+38  -0.14989E+38   0.16349E+38  -0.17633E+38  -0.15797E+38  -0.17633E+38   0.48741E+38
  0.34028E+39
  0.11755E-37
F F F F F F F
F F F F F F F
F F F F F F F
F F F F F F F
F F F F F F F
F F F F F F F
F F F F F F F

When my code is compiled with ifort -O0, everything is as expected: 1.0E+37 / 1.0E+38 = 0.1 everywhere.

Thank you!

zaikunzhang · December 23, 2022, 5:40am

0.1. Thanks.

zaikunzhang · December 23, 2022, 6:03am

Corrected. Thanks.

urbanjost · December 23, 2022, 8:18am

I am definitely missing something. Going over exactly what -Ofast turns on, but with gfortran-10
with no options, U is all zero; but with -Ofast U is not what I expected; get similar affects with the default ifx and ifort latest compiler; so I for one am at least currently missing something unless there is an obvious

S is symmetric.          
T is symmetric.          
S
 -0.12451E+37   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37   0.62253E+36  -0.17633E+38
  0.32671E+37  -0.22380E+37   0.47253E+37   0.32671E+37   0.51027E+37   0.32671E+37  -0.14989E+38
 -0.19067E+37   0.47253E+37   0.22413E+37  -0.19067E+37  -0.37423E+37  -0.19067E+37   0.16349E+38
  0.62253E+36   0.32671E+37  -0.19067E+37  -0.12451E+37   0.24581E+37   0.62253E+36  -0.17633E+38
  0.24581E+37   0.51027E+37  -0.37423E+37   0.24581E+37  -0.27724E+37   0.24581E+37  -0.15797E+38
  0.62253E+36   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37  -0.12451E+37  -0.17633E+38
 -0.17633E+38  -0.14989E+38   0.16349E+38  -0.17633E+38  -0.15797E+38  -0.17633E+38   0.48741E+38
T
 -0.12451E+21   0.32671E+21  -0.19067E+21   0.62253E+20   0.24581E+21   0.62253E+20  -0.17633E+22
  0.32671E+21  -0.22380E+21   0.47253E+21   0.32671E+21   0.51027E+21   0.32671E+21  -0.14989E+22
 -0.19067E+21   0.47253E+21   0.22413E+21  -0.19067E+21  -0.37423E+21  -0.19067E+21   0.16349E+22
  0.62253E+20   0.32671E+21  -0.19067E+21  -0.12451E+21   0.24581E+21   0.62253E+20  -0.17633E+22
  0.24581E+21   0.51027E+21  -0.37423E+21   0.24581E+21  -0.27724E+21   0.24581E+21  -0.15797E+22
  0.62253E+20   0.32671E+21  -0.19067E+21   0.62253E+20   0.24581E+21  -0.12451E+21  -0.17633E+22
 -0.17633E+22  -0.14989E+22   0.16349E+22  -0.17633E+22  -0.15797E+22  -0.17633E+22   0.48741E+22
U
 -0.12451E+37   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37   0.62253E+36  -0.17633E+38
  0.32671E+37  -0.22380E+37   0.47253E+37   0.32671E+37   0.51027E+37   0.32671E+37  -0.14989E+38
 -0.19067E+37   0.47253E+37   0.22413E+37  -0.19067E+37  -0.37423E+37  -0.19067E+37   0.16349E+38
  0.62253E+36   0.32671E+37  -0.19067E+37  -0.12451E+37   0.24581E+37   0.62253E+36  -0.17633E+38
  0.24581E+37   0.51027E+37  -0.37423E+37   0.24581E+37  -0.27724E+37   0.24581E+37  -0.15797E+38
  0.62253E+36   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37  -0.12451E+37  -0.17633E+38
 -0.17633E+38  -0.14989E+38   0.16349E+38  -0.17633E+38  -0.15797E+38  -0.17633E+38   0.48741E+38
V
 -0.79228E+29   0.31691E+30  -0.15846E+30   0.39614E+29   0.00000E+00   0.39614E+29   0.00000E+00
  0.31691E+30  -0.15846E+30   0.00000E+00   0.31691E+30   0.31691E+30   0.31691E+30   0.00000E+00
 -0.15846E+30   0.00000E+00   0.15846E+30  -0.15846E+30  -0.31691E+30  -0.15846E+30   0.00000E+00
  0.39614E+29   0.31691E+30  -0.15846E+30  -0.79228E+29   0.00000E+00   0.39614E+29   0.00000E+00
  0.00000E+00   0.31691E+30  -0.31691E+30   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.39614E+29   0.31691E+30  -0.15846E+30   0.39614E+29   0.00000E+00  -0.79228E+29   0.00000E+00
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.34028E+39
  0.11755E-37
F F F F T F T
F F T F F F T
F T F F F F T
F F F F T F T
T F F T T T T
F F F F T F T
T T T T T T T

This is on a little laptop with an Intel chip. Could someone run that on an AMD with -Ofast and gfortran-10 or newer? I thought I knew what was going on at first, no longer think that.
inxi
CPU: Quad Core Intel Pentium Silver N5030 (-MCP-) speed/min/max: 796/800/1100 MHz Kernel: 5.4.0-132-generic x86_64

zaikunzhang · December 23, 2022, 9:05am

Here is with gfortran 11.3 on an AMD machine with -Ofast.

$ uname -a && lscpu | grep name && gfortran --version && gfortran -Ofast test.f90 && ./a.out 
Linux zP 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Model name:                      AMD Ryzen Threadripper PRO 3975WX 32-Cores
GNU Fortran (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

S is symmetric.          
T is symmetric.          
S
 -0.12451E+37   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37   0.62253E+36  -0.17633E+38
  0.32671E+37  -0.22380E+37   0.47253E+37   0.32671E+37   0.51027E+37   0.32671E+37  -0.14989E+38
 -0.19067E+37   0.47253E+37   0.22413E+37  -0.19067E+37  -0.37423E+37  -0.19067E+37   0.16349E+38
  0.62253E+36   0.32671E+37  -0.19067E+37  -0.12451E+37   0.24581E+37   0.62253E+36  -0.17633E+38
  0.24581E+37   0.51027E+37  -0.37423E+37   0.24581E+37  -0.27724E+37   0.24581E+37  -0.15797E+38
  0.62253E+36   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37  -0.12451E+37  -0.17633E+38
 -0.17633E+38  -0.14989E+38   0.16349E+38  -0.17633E+38  -0.15797E+38  -0.17633E+38   0.48741E+38
T
 -0.12451E-01   0.32671E-01  -0.19067E-01   0.62253E-02   0.24581E-01   0.62253E-02  -0.17633E+00
  0.32671E-01  -0.22380E-01   0.47253E-01   0.32671E-01   0.51027E-01   0.32671E-01  -0.14989E+00
 -0.19067E-01   0.47253E-01   0.22413E-01  -0.19067E-01  -0.37423E-01  -0.19067E-01   0.16349E+00
  0.62253E-02   0.32671E-01  -0.19067E-01  -0.12451E-01   0.24581E-01   0.62253E-02  -0.17633E+00
  0.24581E-01   0.51027E-01  -0.37423E-01   0.24581E-01  -0.27724E-01   0.24581E-01  -0.15797E+00
  0.62253E-02   0.32671E-01  -0.19067E-01   0.62253E-02   0.24581E-01  -0.12451E-01  -0.17633E+00
 -0.17633E+00  -0.14989E+00   0.16349E+00  -0.17633E+00  -0.15797E+00  -0.17633E+00   0.48741E+00
U
 -0.12451E+37   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37   0.62253E+36  -0.17633E+38
  0.32671E+37  -0.22380E+37   0.47253E+37   0.32671E+37   0.51027E+37   0.32671E+37  -0.14989E+38
 -0.19067E+37   0.47253E+37   0.22413E+37  -0.19067E+37  -0.37423E+37  -0.19067E+37   0.16349E+38
  0.62253E+36   0.32671E+37  -0.19067E+37  -0.12451E+37   0.24581E+37   0.62253E+36  -0.17633E+38
  0.24581E+37   0.51027E+37  -0.37423E+37   0.24581E+37  -0.27724E+37   0.24581E+37  -0.15797E+38
  0.62253E+36   0.32671E+37  -0.19067E+37   0.62253E+36   0.24581E+37  -0.12451E+37  -0.17633E+38
 -0.17633E+38  -0.14989E+38   0.16349E+38  -0.17633E+38  -0.15797E+38  -0.17633E+38   0.48741E+38
V
  0.00000E+00   0.00000E+00   0.15846E+30   0.00000E+00   0.00000E+00   0.00000E+00   0.12677E+31
  0.00000E+00   0.15846E+30  -0.31691E+30   0.00000E+00   0.00000E+00   0.00000E+00   0.12677E+31
  0.15846E+30  -0.31691E+30  -0.15846E+30   0.15846E+30   0.00000E+00   0.15846E+30   0.00000E+00
  0.00000E+00   0.00000E+00   0.15846E+30   0.00000E+00   0.00000E+00   0.00000E+00   0.12677E+31
  0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00   0.00000E+00
  0.00000E+00   0.00000E+00   0.15846E+30   0.00000E+00   0.00000E+00   0.00000E+00   0.12677E+31
  0.12677E+31   0.12677E+31   0.00000E+00   0.12677E+31   0.00000E+00   0.12677E+31   0.00000E+00
  0.34028E+39
  0.11755E-37
T T F T T T F
T F F T T T F
F F F F T F T
T T F T T T F
T T T T T T T
T T F T T T F
F F T F T F T

zaikunzhang · December 23, 2022, 11:52am

I suppose this is the simplest version of the test.

program test
implicit none

integer :: i
real :: x(5)
x = 1.0E37  ! All elements of x are set to the same value.
print *, 'Before division, x = ', x
print *, 'All entries of x are identical?', all(x == x(1))
x = x / 1.0E38  ! OR: x = x / x(1)
print *, 'After division, x = ', x
do i = 1, size(x)
    print *, 'i = ', i, 'x(1) == x(i)?', x(1) == x(i)

    print "(*(b0,1x))", x(1), x(i)

    if (x(1) /= x(i)) then
        print *, ''
        print *, '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
        print *, '!!!!!!!!!!! Surprise! x(1) is different from x(i) !!!!!!!!!'
        print *, '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'
        print *, ''
    end if
end do

end program test

Result:

$ uname -a && ifort --version && ifort test_div.f90  && ./a.out
Linux zX10 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
ifort (IFORT) 2021.8.0 20221119
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

 Before division, x =   9.9999999E+36  9.9999999E+36  9.9999999E+36
  9.9999999E+36  9.9999999E+36
 All entries of x are identical? T
 After division, x =   0.0000000E+00  0.0000000E+00  0.0000000E+00   0.0000000E+00  0.1000000    
 i =            1 x(1) == x(i)? T
0 0
 i =            2 x(1) == x(i)? T
0 0
 i =            3 x(1) == x(i)? T
0 0
 i =            4 x(1) == x(i)? T
0 0
 i =            5 x(1) == x(i)? F
0 111101110011001100110011001101
 
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!! Surprise! x(1) is different from x(i) !!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

The behavior of floating-point numbers is arbitrary without options like -fp-model=srict, isn’t it?

FortranFan · December 23, 2022, 2:16pm

@Zaikun ,

You may want to take note your posts are getting repetitive and unrevealing of any new issues.

You were informed weeks ago to apply -fp:strict (aka -fp-model=strict) with the floating-point computational instructions of interest to you with Intel Fortran, especially if you are working with the default REAL intrinsic type in your Fortran code.

As you indicate, you are unable to reproduce the problem here listed in your original post here or the second one thereafter with -fp:strict. If you do, then post back - otherwise note what you are bringing up are simply how things are with Intel Fortran, basically non-issues, and they are so for the reasons pointed out to you earlier.

zaikunzhang · December 23, 2022, 3:26pm

Thank you for pointing things out. I am sorry that I am not posting something that interests you.

I can understand that S(1,7) differs from S(7,1) due to vectorization and lack of -fp-model=strict.

However, I do not believe that fp model is a sufficiently good excuse for evaluating 1.0E37 / 1.0E38 to 0.

If it is quite desirable to have 1.0E37 / 1.0E38 = 0, why don’t ifx and earlier versions of ifort behave in this way?

I am afraid that it is a bug.

Thanks.

FortranFan · December 23, 2022, 4:20pm

Well, you think is a bug, but without stating as such, Intel folks via @greenrongreen appear to indicate to you it is a “feature”!!

See this comment of mine from weeks ago to @greenrongreen :

So my view is what Intel Fortran does with arrays of default REAL type with floating point divisions violates the principle of least astonishment (POLA) with software design by Intel, they are asking you to change your compiler options here to get things right rather than getting expected behavior by default or with the ones you have (some of which do make no sense, that’s a separate matter):

-prec-div is another compiler option
you can consider per Intel.

But the lack of response by @greenrongreen to my earlier inquiry re: a bug here suggests Intel is not bothered by POLA here, so that is that!

So you can take it or leave it

Posting multiple cases of the same underlying aspect is not kosher, if you want a change, pay for a support subscription to Intel, submit a support request on this, and wait for Intel’s response.

In the meantime, use the suggested compiler option(s) and move on.

zaikunzhang · December 23, 2022, 4:24pm

Thank you for the advice.

Let me confirm that I did make it clear that what worries me is

1.0E+37 / 1.0E+38 = ZERO

rather than any other inconsistency in the results.

For me, any value between 0.09 and 0.11 is acceptable, but ZERO is not.

If I made it clear, and you still believe it is a feature, then I will choose to agree with you.

Thanks.

FortranFan · December 23, 2022, 4:29pm

zaikunzhang:

Let me confirm that I did make it clear that what worries me is
1.0E+37 / 1.0E+38 = ZERO
rather than any other inconsistency in the results.

For me, any value between 0.09 and 0.11 is acceptable, but ZERO is not.

If I made it clear, and you still believe it is a feature, then I will choose to agree with you.

You are going to get what you expect if you use what you were advised weeks ago with -prec-div ,-fp:strict, etc. That is where you need to move on.

Separately what I believe re: bug / feature is immaterial, you are dealing with a vendor product by Intel, you need to work with the vendor directly.

zaikunzhang · December 23, 2022, 4:30pm

First of all, I do not request Intel or any company to fix this bug. In my project, I adapt the code to bypass it.

What I am doing is sharing (what I hope is) useful experiences with the community.

At the same time, I report highly suspective behavior to a company that profits from the compiler (I am not talking about the current post, but the one on the Intel forum.) I do not understand why I should pay the company for doing this (instead of the opposite). I do not get the joke.

Note that we are not talking about a community-maintained open-source compiler, for which the situation would be quite different.

Thanks.

zaikunzhang · December 23, 2022, 4:32pm

Thank you for your advice.

I did.

Thanks.

RonShepard · December 23, 2022, 7:45pm

I think this is a different issue from the previous thread, and exploring this a little might be instructive. The previous issue was how reciprocals were approximated when vector hardware was used compared to how divisions were done in the cpu hardware itself. In this case there is a separate issue that arises in this process.

I don’t know exactly how reciprocals are computed in the intel vector hardware, but it is almost certainly some variation of a newton iteration. A newton iteration for a reciprocal is the same as locating a root of the equation

f(r) = d - 1/r

where d is the denominator and r is the desired reciprocal. The newton update is

r_new = r_old - f(r_old) / f'(r_old)

where f'(r_old) is the derivative evaluated at r_old. If you compute the derivative and do the algebra, you end up with

r_new = r_old * (2 - d * r_old)

Notice that this expression involves only one addition and two multiplications each step, and those operations are easier to implement than the original division expression by hand with pencil and paper, in mechanical computers, and in electronic computers, so that is why this algorithm is popular.

You can now see how this works. If the current r_old is too small, then d*r_old is less than one, (2-d*r_old) is greater than one, and r_new will be larger than r_old. Conversely, if r_old is too large, then the product is greater than 1, (2-d*r_old) is less than one, and r_new will be smaller than r_old.

Here is a program that uses this algorithm to approximate a reciprocal, followed by the multiplication to produce the final division result.

implicit none
integer :: i
real :: num, denom, r
write(*,*) 'tiny=', tiny(r), ' huge=', huge(r), ' 1/huge=', 1.0/huge(r)
write(*,*) 'input: num, denom, r'
read(*,*) num, denom, r
do i = 1, 5
   r = r * (2.0 - r * denom)
   write(*,*) 'r=', r
enddo
write(*,*) 'division:', num/denom, ' multiplication:', num * r
end

Here is what happens with a “normal” set of inputs:

 gfortran  fp.f90 && a.out
 tiny=   1.17549435E-38  huge=   3.40282347E+38  1/huge=   2.93873588E-39
 input: num, denom, r
3 5 .25
 r=  0.187500000    
 r=  0.199218750    
 r=  0.199996948    
 r=  0.200000003    
 r=  0.200000003    
 division:  0.600000024      multiplication:  0.600000024

So you can see how the initial guess for r was too small, and the newton steps increase that guess at each step of the iteration. If you play around with the inputs a little, you can see that if the initial inputs are too large, then the sequences converges from above rather than from below. The other feature is that the number of correct digits in the reciprocal doubles each iteration. So not only are the newton steps self-correcting, they self-correct really really well. With a good initial guess, you might only need one or two steps. However, with a poor initial guess for r, the sequence might not converge in five steps, or it might immediately diverge. A good implementation of the algorithm must test for these cases and adapt appropriately.

Note also that the extra info that is printed shows that 1/huge is smaller than tiny. That will be an important point below.

When a computer uses this algorithm to compute reciprocals, it looks at the floating point exponent and simply shifts the exponent value to get an initial guess that is within a factor of two of the correct result. That operation involves only extracting the exponent field (which is biased) and shifting down for large values or up for small values, which requires just integer arithmetic, so it is fast and cheap. Some computers might also look at the leading bits in the mantissa to get a little closer initial guess. If they start with the initial 4 bits, then a 16-element table lookup would give 4 correct mantissa bits in the initial reciprocal approximation. The first iteration would give 8 correct bits, the next would give 16 bits, and the next would give 32 bits. There are only 23 mantissa bits in a 32-bit real value, so only three iterations would ever be required. Some cpus have that lookup table stored in read-only registers, so it is always available without any runtime initializations or anything. A larger table might be used in order to reduce the number of iterations.

Ok, so what does this algorithm do with really large numbers? Here is one example where the initial guess to r should be pretty good, and where the value is still a normal fp number.

 $ ifort fp.f90 && a.out
 tiny=  1.1754944E-38  huge=  3.4028235E+38  1/huge=  0.0000000E+00
 input: num, denom, r
1.e37 1.e38 9e-37
 r= -7.9199988E-35
 r= -6.2742218E-31
 r= -3.9365857E-23
 r= -1.5496707E-07
 r= -2.4014791E+24
 division:  0.1000000      multiplication:      -Infinity

The “pretty good” initial guess isn’t good enough, and it diverges. Here is another case where the initial guess is a denormal number.

$ifort fp.f90 && a.out
 tiny=  1.1754944E-38  huge=  3.4028235E+38  1/huge=  0.0000000E+00
 input: num, denom, r
1e37 1e38 1e-38
 r=  0.0000000E+00
 r=  0.0000000E+00
 r=  0.0000000E+00
 r=  0.0000000E+00
 r=  0.0000000E+00
 division:  0.1000000      multiplication:  0.0000000E+00

Notice here that the input r is actually the correct value, but since it is a denormal number, it gets set to 0.0 by default by ifort. In this case, the denominator is a normal number, but 1/denom is a denormal number. If I do the same thing with gfortran,

$ gfortran  fp.f90 && a.out
 tiny=   1.17549435E-38  huge=   3.40282347E+38  1/huge=   2.93873588E-39
 input: num, denom, r
1e37 1e38 1e-38
 r=   1.00000008E-38
 r=   1.00000008E-38
 r=   1.00000008E-38
 r=   1.00000008E-38
 r=   1.00000008E-38
 division:  0.100000001      multiplication:  0.100000009

So the new thing here is what happens with denormal intermediate results. Are they set to zero, or are they used in (relatively expensive) floating point operations with those denormal values. As seen above, the computed result can range from zero, to Inf, and can even be the correct result.

RonShepard · December 23, 2022, 8:07pm

zaikunzhang:

Let me confirm that I did make it clear that what worries me is
1.0E+37 / 1.0E+38 = ZERO
rather than any other inconsistency in the results.

For me, any value between 0.09 and 0.11 is acceptable, but ZERO is not.

In this expression 1.0E+38 is a normal number, but the reciprocal 1.0/1.0E+38 is a denormal number (i.e. it is smaller than tiny(1.0)). If the expression is computed as a division, then one might expect the correct result. But if it is computed as a recripricol followed by the multiplication, then the answer depends on what the compiler does with that denormal intermediate number. If it is set to zero, then the result of the multiplication will be zero.

As I said in the previous related thread, the compiler is not required to evaluate an expression in every possible way and choose the one that eliminates any overflows, underflows, or denormal intermediate results. It is allowed to choose one way, compile it to machine code, and then return that result. In this case, that means it is acceptable for the compiler to return a value of zero. Hopefully when it does so, it will raise exceptions or print runtime warnings or notify the programmer somehow what has happened. The programmer can then rewrite the expression, or rescale the problem, or somehow otherwise manage to compute the correct result.

Contrary to what others have stated in this thread, I do not think a compiler vendor is required to give you compiler options that result in what you think is the correct value for this expression. In this case, I think it is up to the programmer to fix the problem.

Here is an interesting question. If the programmer wanted to evaluate this expression as a reciprocal followed by the multiplication, then I think

1.0E+37 * (1.0 / 1.0E+38)

will achieve that. Fortran must respect parentheses, and I think that requires the reciprocal computation in this expression. But the question is how does a programmer force the expression to be evaluted as a divison rather than as the reciprocal+multiplication? In the original expression, written as a division, the compiler is allowed to change it to the mathematically equivalent from. In fact, it is not even required for the underlying hardware to have a division instruction, much less for the compiler to use it. So if there is no way to rewrite the expression to force division, the only options are for the programmer to rescale the problem. In this case, that might involve multiplying both the numerator and the denominator by some scale factor. If scale factors are powers of two, then usually the exponents get shifted but the mantissa bits remain the same, so a way to rewrite that expression might be

   scale = 2.0**(-90)
   r = (1.0e37 * scale) / (1.0e38 * scale)

Since the compiler must respect those parentheses, that should avoid any denormals or underflows no matter how the division operation is implemented.

RonShepard · December 23, 2022, 10:17pm

That is a good idea. The SCALE() and EXPONENT() intrinsic functions were added specifically to avoid these kinds of overflow and underflow problems with minimal computational effort.

zaikunzhang · December 24, 2022, 3:13am

Thank you @RonShepard very much for your insightful analysis that reveals the truth.

Now the behavior is understandable to me, even not desirable. I will keep this feature in mind and adapt my code and the invocations of ifort.

I hope this discussion is useful or interesting to other users of ifort.

RonShepard · December 24, 2022, 3:37am

I think it was useful. There is a range of floating point numbers between 1.0/tiny(1.0) and huge(1.0) where one must be careful with reciprocals and divisions. This is not specific to ifort, this information applies to any compiler. I used ifort and gfortran to demonstrate what was happening because their default settings, but there are almost certainly compiler options for both that would switch the behavior regarding denormals.

Topic		Replies	Views
Strange behavior of `ifort`	57	2542	December 23, 2022
Nearest(huge(1.0), 1.0)? Help	20	433	January 10, 2025
Intel OneAPI 2023 released Announcements	4	1167	December 24, 2022
Should 0.0/0.0 be NaN? Help	17	905	March 10, 2024
Working with reals near the machine precision Help	35	1442	February 16, 2024

Ifort (IFORT) 2021.8.0: `1.0E+37 / 1.0E+38 = 0`

Related topics