Is comparison with `NaN` considered an erroneous arithmetic operation? (MWE with ifort and gfortran)

Quick question: Should comparison with NaN be considered an erroneous arithmetic operation or even lead to a SIGSEGV?

Note that what I would like to ask is the above yes/no question. I hope to hear your opinions about this question. What follows is only explanations about the question. The example below is only to illustrate that the question is not purely imaginary.

Background:

With some (but not all) compilers, it happens in my tests that comparing a number with NaN is sometimes (but not always) caught by -ffpe-trap=invalid and a floating point exception is raised, with “erroneous arithmetic operation” reported.

(Sorry that I do not have a minimal working example, as this behavior appears randomly — comparison with NaN does not always trigger the exception, even with the same compiler and the same options. If this makes the question above meaningless to you, just ignore it.)

Here is a minimal working example:

program test_fpe
use, intrinsic :: ieee_arithmetic, only : ieee_value, ieee_quiet_nan, ieee_is_nan

implicit none

real :: a

a = ieee_value(a, ieee_quiet_nan)

print *, a, ieee_is_nan(a)
print *, a <= 0

end program test_fpe

For the testing results on my side, see my post below. In brief, ifort raises an FPE or SIGSEGV (see Compiler Explorer) on this code, and gfortran raises an FPE (see Compiler Expoler). I just want to confirm whether the results are expected.

Note that it is not my intention to compare numbers with NaN. However, when solving strongly ill-conditioned nonlinear problems, encountering NaN from time to time is not strange. How to avoid this NaN is not my question here.

Thank you.

NaNs, infinities and the accompanying arithmetic operations were invented to make arithmetic a closed system. I can imagine that the flag -ffpe-trap-invalid is contradictory to that goal. After all, they are specific values that fall out of the normal range.

1 Like

I do not think that a NaN is a specific value that fall out of the normal range. It is Not-A-Number.
Any comparison (<, >, ==,…) of any normal number with a NaN results in .FALSE. The only obvious exception is /= which always yields .TRUE. Well defined behavior is IMHO guaranteed only when using procedures from intrinsic ieee_arithmetic module, if available. E.g.

17.11.28 IEEE_QUIET_LT (A, B)
1 Description. Quiet compares less than.
2 Class. Elemental function.
3 Arguments.
A shall be of type real.
B shall have the same type and kind type parameter as A.
4 Restriction. IEEE_QUIET_LT (A, B) shall not be invoked if IEEE_SUPPORT_DATATYPE (A) has the value false.
5 Result Characteristics. Default logical.
6 Result Value. The result has the value specified for the compareQuietLess operation in ISO/IEC/IEEE
60559:2011; that is, it is true if and only if A compares less than B. If A or B is a NaN, the result will be
false. If A or B is a signaling NaN, IEEE_INVALID signals; otherwise, no exception is signaled.
7 Example. IEEE_QUIET_LT (1.0, IEEE_VALUE (IEEE_QUIET_NAN)) has the value false and no exception is signaled

BTW, it seems that the Examples in that section of Standard (18-007r1.pdf) contain erroneous use of ieee_value() function with only one argument instead of two.

2 Likes

This is fixed for Fortran 2023.

Here is a minimal working example.

program test_fpe
use, intrinsic :: ieee_arithmetic, only : ieee_value, ieee_quiet_nan, ieee_is_nan

implicit none

real :: a

a = ieee_value(a, ieee_quiet_nan)

print *, a, ieee_is_nan(a)
print *, a <= 0

end program test_fpe
test: test0 test1 test2 test3

test0:
	ifort -standard-semantics -fp-stack-check -fpe-all=0 -traceback test_fpe.f90 && ./a.out

test1:
	ifort -fp-stack-check -fpe-all=0 -traceback test_fpe.f90 && ./a.out

test2:
	ifort -standard-semantics -fpe-all=0 -traceback test_fpe.f90 && ./a.out

test3:
	gfortran -g -ffpe-trap=invalid test_fpe.f90 && ./a.out   

On Ubuntu 22.04, with ifort (IFORT) 2021.7.1 20221019, I got SIGSEGV for test0:

 NaN T
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
libc.so.6          00007F124DAAC520  Unknown               Unknown  Unknown
a.out              0000000000406F54  Unknown               Unknown  Unknown
a.out              0000000000403A03  Unknown               Unknown  Unknown
libc.so.6          00007F124DAAC520  Unknown               Unknown  Unknown
a.out              000000000045CA33  Unknown               Unknown  Unknown
a.out              00000000004044EC  Unknown               Unknown  Unknown
a.out              0000000000404BE8  MAIN__                     11  test_fpe.f90
a.out              00000000004049DD  Unknown               Unknown  Unknown
libc.so.6          00007F124DA93D90  Unknown               Unknown  Unknown
libc.so.6          00007F124DA93E40  __libc_start_main     Unknown  Unknown
a.out              00000000004048F5  Unknown               Unknown  Unknown
make: *** [Makefile:4: test0] Error 174

For test1/2, I got an FPE:

            NaN T
forrtl: error (65): floating invalid
Image              PC                Routine            Line        Source             
libc.so.6          00007F3C88FD2520  Unknown               Unknown  Unknown
a.out              0000000000404C0B  MAIN__                     11  test_fpe.f90
a.out              00000000004049DD  Unknown               Unknown  Unknown
libc.so.6          00007F3C88FB9D90  Unknown               Unknown  Unknown
libc.so.6          00007F3C88FB9E40  __libc_start_main     Unknown  Unknown
a.out              00000000004048F5  Unknown               Unknown  Unknown
Aborted (core dumped)
make: *** [Makefile:7: test1] Error 134

For test3 with GNU Fortran (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, I got an FPE:

gfortran -g -ffpe-trap=invalid test_fpe.f90 && ./a.out
              NaN T

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fa72bc90ad0 in ???
#1  0x7fa72bc8fc35 in ???
#2  0x7fa72ba8751f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x5633590352d7 in test_fpe
	at /home/zaikunzhang/tmp/test_fpe.f90:11
#4  0x56335903535d in main
	at /home/zaikunzhang/tmp/test_fpe.f90:2
Floating point exception (core dumped)
make: *** [Makefile:22: test6] Error 136

Are these results expected?

Note that it is not my intention to compare numbers with NaN. However, when solving strongly ill-conditioned nonlinear problems, encountering NaN from time to time is not strange. How to avoid this NaN is not my question here.

I can only repeat myself:

After changing

print *, a <= 0

to

print *, ieee_quiet_le(a, 0.0)

I am getting F as expected. Just following remarks:
My ifort (newest ver. 2021.8.0, from OneAPI 2023) reports -fp-trap as unknown option.
gfortran (ver. 12.1) lacks ieee_quiet_XX functions in ieee_arithmetic module.

1 Like

Thank you @msz59 for trying.

This is my fault. That option is not needed. I updated the code to remove it.

This is great. However, does this mean that we have to replace <= with ieee_quiet_le almost everywhere to avoid the FPE? This does not sound very practical. As I said, it is not strange to encounter NaN from time to time (almost everywhere) if you are dealing with ill-conditioned problems.

For ifort, you must use -fp-model=strict if you’re going to play with NaNs and other IEEE FP things. I generally recommend this any time you use the IEEE intrinsic modules. I would certainly NOT use -fpe0 in such cases.
For test0 on Windows, with the Windows version of -fp-model=strict (/fp:strict), I get:

            NaN T
 F

for test0, and no segfault.

1 Like

Thank you @sblionel for the response and for directing me to fp-model = strict.

So, this SEGFAULT is expected rather than a false positive (due to improper options).

According to Wikipedia, SEGFAULT means “the software has attempted to access a restricted area of memory”. I am wondering, what is that restricted area of memory in this case?

Another general question is: Should comparison with NaN be considered an erroneous arithmetic operation?

No, a segfault is not expected. I’m having trouble reproducing that, but I have Windows, not Linux.

1 Like

Oh, and drop -fp-stack-check - that does nothing useful for you (it is for pre-SSE x87 32-bit code.)

1 Like

I would say that NaNs are an unusual situation. Do you know how the NaNs are generated in your code? Are you computing sqrt(x) for negative x and other things like that? Or are you generating NaNs on purpose to identify missing data?

Very often due to overflow, and then things like inf - inf, inf / inf, etc. I am working with single precision (on purpose). This is really problem-dependent, and I do not think a discussion on (problem-specific) numerical analysis is relevant here. I will not do that but focus on Fortran in this discourse.

If you are working on strongly ill-conditioned problems, NaN is not a strange thing. It is normal. You should avoid it as much as possible by improving your algorithm and code, but there is no way to eliminate it completely if your code runs under tough conditions for a sufficiently long time.

I detest NaN, but things will not go away just because we detest them.

Thank you for pointing this out. I have removed it from all my Makefiles involving ifort / ifx.

However, I would like to mention that the SEGFAULT in test0 will disappear if -fp-stack-check is removed (see test2). So, it is true that -fp-stack-check does nothing useful here, but it does do something significant — significant enough to cause a SEGFAULT.