Odd gfortran result with atan

I noticed that atan() gives a slightly different result when using -O0 if the input is a constant vs a variable. I’m using GFortran 11.1.0 on an M1 Mac. Is it a GFortran bug, a Mac bug, expected behavior? I understand that code can give slightly different answers for different optimization levels, but this seems weird to me, since the same value is being passed into the function.

Example:

program test

use iso_fortran_env, only: wp => real64 

real(wp),parameter :: x = 2.0_wp

real(wp) :: y 

y = x

write(*,*) atan(x)                 
write(*,*) atan(2.0_wp)   
write(*,*) atan(y)   
write(*,*) atan(2.0_wp) == atan(y)  

end program test

Compile with: gfortran -O0 atan2_test.f90 -o test produces:

   1.1071487177940904     
   1.1071487177940904     
   1.1071487177940906     
 F

Compiled with gfortran -O3 atan2_test.f90 -o test produces (actually, same for -O1, -O2 and -O3):

   1.1071487177940904     
   1.1071487177940904     
   1.1071487177940904     
 T

I do not see this difference when using ifort.

I am not able to reproduce this on Windows 10/Mingw64-gfortran v12.1 or WSL/gfortran v9.4. CPU is intel 11th gen. In all cases I get

   1.1071487177940904
   1.1071487177940904
   1.1071487177940904
 T
1 Like

Oh interesting. I wonder if it’s just a bug for the Apple/M1 Gfortran build? Maybe that is still experimental? The one I’m using is from conda-forge.

No difference with Ubuntu 22.04:

$ gfortran -O0 atan2_test.f90 -o test
$ ./test
   1.1071487177940904     
   1.1071487177940904     
   1.1071487177940904     
 T
$ gfortran -O3 atan2_test.f90 -o test
$ ./test
   1.1071487177940904     
   1.1071487177940904     
   1.1071487177940904     
 T
$ uname -op
x86_64 GNU/Linux
$ gfortran --version
GNU Fortran (Ubuntu 11.2.0-19ubuntu1) 11.2.0

No problem also with GFortran 12.0.1.

Interesting. I don’t know how to intrepret these files but this is what I get:

For gfortran -O0 atan2_test.f90 -o test -fdump-tree-original:

__attribute__((fn spec (". ")))
void test ()
{
  real(kind=8) y;

  y = 2.0e+0;
  {
    struct __st_parameter_dt dt_parm.0;

    dt_parm.0.common.filename = &"atan2_test.f90"[1]{lb: 1 sz: 1};
    dt_parm.0.common.line = 11;
    dt_parm.0.common.flags = 128;
    dt_parm.0.common.unit = 6;
    _gfortran_st_write (&dt_parm.0);
    {
      static real(kind=8) C.3192 = 1.10714871779409040897235172451473772525787353515625e+0;

      _gfortran_transfer_real_write (&dt_parm.0, &C.3192, 8);
    }
    _gfortran_st_write_done (&dt_parm.0);
  }
  {
    struct __st_parameter_dt dt_parm.1;

    dt_parm.1.common.filename = &"atan2_test.f90"[1]{lb: 1 sz: 1};
    dt_parm.1.common.line = 12;
    dt_parm.1.common.flags = 128;
    dt_parm.1.common.unit = 6;
    _gfortran_st_write (&dt_parm.1);
    {
      static real(kind=8) C.3194 = 1.10714871779409040897235172451473772525787353515625e+0;

      _gfortran_transfer_real_write (&dt_parm.1, &C.3194, 8);
    }
    _gfortran_st_write_done (&dt_parm.1);
  }
  {
    struct __st_parameter_dt dt_parm.2;

    dt_parm.2.common.filename = &"atan2_test.f90"[1]{lb: 1 sz: 1};
    dt_parm.2.common.line = 13;
    dt_parm.2.common.flags = 128;
    dt_parm.2.common.unit = 6;
    _gfortran_st_write (&dt_parm.2);
    {
      real(kind=8) D.3196;

      D.3196 = __builtin_atan (y);
      _gfortran_transfer_real_write (&dt_parm.2, &D.3196, 8);
    }
    _gfortran_st_write_done (&dt_parm.2);
  }
  {
    struct __st_parameter_dt dt_parm.3;

    dt_parm.3.common.filename = &"atan2_test.f90"[1]{lb: 1 sz: 1};
    dt_parm.3.common.line = 14;
    dt_parm.3.common.flags = 128;
    dt_parm.3.common.unit = 6;
    _gfortran_st_write (&dt_parm.3);
    {
      logical(kind=4) D.3198;

      D.3198 = __builtin_atan (y) == 1.10714871779409040897235172451473772525787353515625e+0;
      _gfortran_transfer_logical_write (&dt_parm.3, &D.3198, 4);
    }
    _gfortran_st_write_done (&dt_parm.3);
  }
}


__attribute__((externally_visible))
integer(kind=4) main (integer(kind=4) argc, character(kind=1) * * argv)
{
  static integer(kind=4) options.4[7] = {2116, 4095, 0, 1, 1, 0, 31};

  _gfortran_set_args (argc, argv);
  _gfortran_set_options (7, &options.4[0]);
  test ();
  return 0;
}

and for gfortran -O0 atan2_test.f90 -o test -fdump-tree-optimize:


;; Function test (MAIN__, funcdef_no=0, decl_uid=3188, cgraph_uid=1, symbol_order=0)

__attribute__((fn spec (". ")))
void test ()
{
  logical(kind=4) D.3198;
  struct __st_parameter_dt dt_parm.3;
  real(kind=8) D.3196;
  struct __st_parameter_dt dt_parm.2;
  struct __st_parameter_dt dt_parm.1;
  struct __st_parameter_dt dt_parm.0;
  real(kind=8) y;
  real(kind=8) _1;
  real(kind=8) _2;
  logical(kind=4) _3;

  <bb 2> :
  y_4 = 2.0e+0;
  dt_parm.0.common.filename = &"atan2_test.f90"[1]{lb: 1 sz: 1};
  dt_parm.0.common.line = 11;
  dt_parm.0.common.flags = 128;
  dt_parm.0.common.unit = 6;
  _gfortran_st_write (&dt_parm.0);
  _gfortran_transfer_real_write (&dt_parm.0, &C.3192, 8);
  _gfortran_st_write_done (&dt_parm.0);
  dt_parm.0 ={v} {CLOBBER};
  dt_parm.1.common.filename = &"atan2_test.f90"[1]{lb: 1 sz: 1};
  dt_parm.1.common.line = 12;
  dt_parm.1.common.flags = 128;
  dt_parm.1.common.unit = 6;
  _gfortran_st_write (&dt_parm.1);
  _gfortran_transfer_real_write (&dt_parm.1, &C.3194, 8);
  _gfortran_st_write_done (&dt_parm.1);
  dt_parm.1 ={v} {CLOBBER};
  dt_parm.2.common.filename = &"atan2_test.f90"[1]{lb: 1 sz: 1};
  dt_parm.2.common.line = 13;
  dt_parm.2.common.flags = 128;
  dt_parm.2.common.unit = 6;
  _gfortran_st_write (&dt_parm.2);
  _1 = __builtin_atan (y_4);
  D.3196 = _1;
  _gfortran_transfer_real_write (&dt_parm.2, &D.3196, 8);
  D.3196 ={v} {CLOBBER};
  _gfortran_st_write_done (&dt_parm.2);
  dt_parm.2 ={v} {CLOBBER};
  dt_parm.3.common.filename = &"atan2_test.f90"[1]{lb: 1 sz: 1};
  dt_parm.3.common.line = 14;
  dt_parm.3.common.flags = 128;
  dt_parm.3.common.unit = 6;
  _gfortran_st_write (&dt_parm.3);
  _2 = __builtin_atan (y_4);
  _3 = _2 == 1.10714871779409040897235172451473772525787353515625e+0;
  D.3198 = _3;
  _gfortran_transfer_logical_write (&dt_parm.3, &D.3198, 4);
  D.3198 ={v} {CLOBBER};
  _gfortran_st_write_done (&dt_parm.3);
  dt_parm.3 ={v} {CLOBBER};
  return;

}



;; Function main (main, funcdef_no=1, decl_uid=3200, cgraph_uid=2, symbol_order=1)

__attribute__((externally_visible))
integer(kind=4) main (integer(kind=4) argc, character(kind=1) * * argv)
{
  static integer(kind=4) options.4[7] = {2116, 4095, 0, 1, 1, 0, 31};
  integer(kind=4) D.3205;
  integer(kind=4) _7;

  <bb 2> :
  _gfortran_set_args (argc_2(D), argv_3(D));
  _gfortran_set_options (7, &options.4[0]);
  test ();
  _7 = 0;

  <bb 3> :
<L0>:
  return _7;

}

Thanks! Very interesting! If I understand correctly… you think it’s the math lib on the Mac that’s the problem? Is that coming from Apple I guess?

In effect, the problem that is reported is related to the involvement of two “Fortran processors”.

A similar report was made to the Absoft Fortran forum over a decade ago. The expression dcos(0d0) was evaluated at compile time by a built-in routine in the compiler, which was a 32-bit EXE that used x87 arithmetic to evaluate the expression, and compared to the run time value of dcos(angle) with angle=0d0 at runtime using the 64-bit library routine, which utilised SSE2 arithmetic. The expressions differed by about 1d-16, and as a result the compile time and runtime values did not match.

Here is a similar program, for which two current compilers for Windows, with certain compiler options, yield an unexpected result.

program cmpbug
   double precision :: bnd, sigma = 1.0d70
   character(len=6) :: instr = '1.0d70'
!
! compare whether a DP constant in the program text compares as equal
! to the same constant read from a string. In other words, do formatted
! reads performed at compile time give the same internal numbers as
! formatted reads at run time?
!
   read (instr,'(f6.0)') bnd
   write(*,10)bnd,sigma,(bnd == sigma)
10 format(1x,1p,D24.17,' == ',D24.17,' ?  ',l5)
end program

The unexpected output:

  1.00000000000000000D+70 ==  1.00000000000000000D+70 ?      F
1 Like

There are some _atan's in there:

	.arch armv8-a
	.text
	.cstring
	.align	3
lC0:
	.ascii "atan2_test.f90\0"
	.const
	.align	3
lC1:
	.word	-1830044092
	.word	1072805601
	.text
	.align	2
_MAIN__:
LFB0:
	sub	sp, sp, #576
LCFI0:
	stp	x29, x30, [sp]
LCFI1:
	mov	x29, sp
	fmov	d0, 2.0e+0
	str	d0, [sp, 568]
	adrp	x0, lC0@GOTPAGE
	ldr	x0, [x0, lC0@GOTPAGEOFF];momd
	str	x0, [sp, 32]
	mov	w0, 11
	str	w0, [sp, 40]
	mov	w0, 128
	str	w0, [sp, 24]
	mov	w0, 6
	str	w0, [sp, 28]
	add	x0, sp, 24
	bl	__gfortran_st_write
	add	x3, sp, 24
	mov	w2, 8
	adrp	x0, lC1@GOTPAGE
	ldr	x1, [x0, lC1@GOTPAGEOFF];momd
	mov	x0, x3
	bl	__gfortran_transfer_real_write
	add	x0, sp, 24
	bl	__gfortran_st_write_done
	adrp	x0, lC0@GOTPAGE
	ldr	x0, [x0, lC0@GOTPAGEOFF];momd
	str	x0, [sp, 32]
	mov	w0, 12
	str	w0, [sp, 40]
	mov	w0, 128
	str	w0, [sp, 24]
	mov	w0, 6
	str	w0, [sp, 28]
	add	x0, sp, 24
	bl	__gfortran_st_write
	add	x3, sp, 24
	mov	w2, 8
	adrp	x0, lC1@GOTPAGE
	ldr	x1, [x0, lC1@GOTPAGEOFF];momd
	mov	x0, x3
	bl	__gfortran_transfer_real_write
	add	x0, sp, 24
	bl	__gfortran_st_write_done
	adrp	x0, lC0@GOTPAGE
	ldr	x0, [x0, lC0@GOTPAGEOFF];momd
	str	x0, [sp, 32]
	mov	w0, 13
	str	w0, [sp, 40]
	mov	w0, 128
	str	w0, [sp, 24]
	mov	w0, 6
	str	w0, [sp, 28]
	add	x0, sp, 24
	bl	__gfortran_st_write
	ldr	d0, [sp, 568]
	bl	_atan
	str	d0, [sp, 560]
	add	x1, sp, 560
	add	x0, sp, 24
	mov	w2, 8
	bl	__gfortran_transfer_real_write
	add	x0, sp, 24
	bl	__gfortran_st_write_done
	adrp	x0, lC0@GOTPAGE
	ldr	x0, [x0, lC0@GOTPAGEOFF];momd
	str	x0, [sp, 32]
	mov	w0, 14
	str	w0, [sp, 40]
	mov	w0, 128
	str	w0, [sp, 24]
	mov	w0, 6
	str	w0, [sp, 28]
	add	x0, sp, 24
	bl	__gfortran_st_write
	ldr	d0, [sp, 568]
	bl	_atan
	adrp	x0, lC2@GOTPAGE
	add	x0, x0, lC2@PAGEOFF;momd
	ldr	d1, [x0]
	fcmp	d0, d1
	cset	w0, eq
	str	w0, [sp, 556]
	add	x1, sp, 556
	add	x0, sp, 24
	mov	w2, 4
	bl	__gfortran_transfer_logical_write
	add	x0, sp, 24
	bl	__gfortran_st_write_done
	nop
	ldp	x29, x30, [sp]
	add	sp, sp, 576
LCFI2:
	ret
LFE0:
	.align	2
	.globl _main
_main:
LFB1:
	stp	x29, x30, [sp, -32]!
LCFI3:
	mov	x29, sp
	str	w0, [sp, 28]
	str	x1, [sp, 16]
	ldr	x1, [sp, 16]
	ldr	w0, [sp, 28]
	bl	__gfortran_set_args
	adrp	x0, _options.4.0@PAGE
	add	x1, x0, _options.4.0@PAGEOFF;momd
	mov	w0, 7
	bl	__gfortran_set_options
	bl	_MAIN__
	mov	w0, 0
	ldp	x29, x30, [sp], 32
LCFI4:
	ret
LFE1:
	.const
	.align	3
_options.4.0:
	.word	2116
	.word	4095
	.word	0
	.word	1
	.word	1
	.word	0
	.word	31
	.align	3
lC2:
	.word	-1830044092
	.word	1072805601
	.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
EH_frame1:
	.set L$set$0,LECIE1-LSCIE1
	.long L$set$0
LSCIE1:
	.long	0
	.byte	0x1
	.ascii "zR\0"
	.uleb128 0x1
	.sleb128 -8
	.byte	0x1e
	.uleb128 0x1
	.byte	0x10
	.byte	0xc
	.uleb128 0x1f
	.uleb128 0
	.align	3
LECIE1:
LSFDE1:
	.set L$set$1,LEFDE1-LASFDE1
	.long L$set$1
LASFDE1:
	.long	LASFDE1-EH_frame1
	.quad	LFB0-.
	.set L$set$2,LFE0-LFB0
	.quad L$set$2
	.uleb128 0
	.byte	0x4
	.set L$set$3,LCFI0-LFB0
	.long L$set$3
	.byte	0xe
	.uleb128 0x240
	.byte	0x4
	.set L$set$4,LCFI1-LCFI0
	.long L$set$4
	.byte	0x9d
	.uleb128 0x48
	.byte	0x9e
	.uleb128 0x47
	.byte	0x4
	.set L$set$5,LCFI2-LCFI1
	.long L$set$5
	.byte	0xdd
	.byte	0xde
	.byte	0xe
	.uleb128 0
	.align	3
LEFDE1:
LSFDE3:
	.set L$set$6,LEFDE3-LASFDE3
	.long L$set$6
LASFDE3:
	.long	LASFDE3-EH_frame1
	.quad	LFB1-.
	.set L$set$7,LFE1-LFB1
	.quad L$set$7
	.uleb128 0
	.byte	0x4
	.set L$set$8,LCFI3-LFB1
	.long L$set$8
	.byte	0xe
	.uleb128 0x20
	.byte	0x9d
	.uleb128 0x4
	.byte	0x9e
	.uleb128 0x3
	.byte	0x4
	.set L$set$9,LCFI4-LCFI3
	.long L$set$9
	.byte	0xde
	.byte	0xdd
	.byte	0xe
	.uleb128 0
	.align	3
LEFDE3:
	.ident	"GCC: (GNU) 11.1.0"
	.subsections_via_symbols

I’ve tried on a mac intel with gfortran 11.2.0 and I’ve obtained the same behaviour:

  • gfortran -O0 x_atan.f90
    1.1071487177940904
    1.1071487177940904
    1.1071487177940906
    F
  • gfortran -O3 x_atan.f90
    1.1071487177940904
    1.1071487177940904
    1.1071487177940904
    T
1 Like

Ah interesting, so it isn’t M1 specific.

This is the difference in compile-time and run-time evaluation of expressions. This can arise from different rounding modes or from different algorithms to evaluate the function. It happens all the time, and is why comparisons of real numbers should only be done in specific situations such as comparing exact zeros, small integer values, and similar situations. Since you are using gfortran, you can print out the values in hex format, and you will probably see that the difference is in the least significant bit, or maybe two bits.

1 Like

Very true, and we had a recent demonstration of this in a thread here. The compiler’s built-in intrinsic function index had a bug, but the Gfortran RTL intrinsic index worked correctly. Whether the bug was visible or not depended on the optimization level specified to the compiler. At low optimization levels, the compiler may choose the RTL routine, but at a higher optimization level it may do the evaluation of a constant expression at compile time using the built-in library routine.

1 Like

no, this is just a compiler big they’re a total pain to fix (I should know, I’ve had to fix a few of these for Julia’s compiler), but being able to get reproducible results is important.

the compiler chooses which math library to use. you can pass the blame all you want, but if the compiler chooses to use broken math libraries, the user code is still broken and it’s still the compilers fault. (especially since it just just swap in openlibm to fix it)

For comparison, I’ve calculated atan(2.0) with Python3.9.1/Numpy1.19.4 and Julia 1.6.1 on MacMini (2012, Core-i7), which gives

>>> import numpy as np
>>> np.arctan(2.0)
1.1071487177940906
>>> import math
>>> math.atan(2.0)
1.1071487177940906
julia> atan(2.0)
1.1071487177940904

So, maybe python3 is also using Apple’s math library (by default)? It may be interesting to try it with other platforms or distributions…
(I imagined that this level of difference is just “acceptable”; is it not the case for library implementation?)

Searching the net with “apple libm” gave me these pages:


Btw, looking at the output of gfortran -fdump-tree-optimized, it seems that a literal value of “1” is passed to the internal output routine as the result of atan(2.0_wp) == atan(y), so the comparison seems really evaluated at compile-time…

2 Likes

With Maple17, Digits:=30; evalf(arctan(2.0)); gives 1.10714871779409050301706546018 . You can verify this using, for instance, the Casio MathKeisan online calculator.

As you can see, the true answer lies about halfway between the two values that @septc reported. On the other hand, the relative differences are a fraction of machine epsilon in double precision, so we are splitting hairs. In fact, I may speculate whether the differences are attributable to the formatted output routines which, given the same internal 64-bit floating point representation, may output a different 17th significant digit without meriting criticism.

Since @oscardssmith thinks that this is the compiler’s responsibility, I wonder what value OpenLibm gives for this expression.

1 Like

Hi @mecej4, I think the Julia version installed on my Mac (used above) is using openlibm, according to the output below:

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, ivybridge)

I also tried to find what math library is used by Python or gfortran I used above (e.g. location of the libm.so), but not sure how to get the info…

for atan (and pretty much all functions) Julia uses pure Julia implementations. We link against openlib, but that’s mostly just because LLVM requires you to link against something. To be clear, I don’t think that it’s critical to always round to nearest, but I think it is important that unless the user specifies a fastmath flag, it should definitely give the same result on the same computer, and ideally it would give the same result on all supported operating systems. (for the record, Julia gives atan(2.0) == 1.1071487177940904 which in this case is the correctly rounded result).

1 Like

I see… thanks for the info!
FWIW, I’ve just recalled the macro @less to print the corresponding code in the base library, so tried it for atan, which shows this code (in base/special/trig.jl).

julia> @less atan(2.0)
function atan(x::T) where T<:Union{Float32, Float64}
    # Method
    #   1. Reduce x to positive by atan(x) = -atan(-x).
    #   2. According to the integer k=4t+0.25 chopped, t=x, the argument
    #      is further reduced to one of the following intervals and the
    #      arctangent of t is evaluated by the corresponding formula:
    #
    #      [0,7/16]      atan(x) = t-t^3*(a1+t^2*(a2+...(a10+t^2*a11)...)
    #      [7/16,11/16]  atan(x) = atan(1/2) + atan( (t-0.5)/(1+t/2) )
    #      [11/16.19/16] atan(x) = atan( 1 ) + atan( (t-1)/(1+t) )
    #      [19/16,39/16] atan(x) = atan(3/2) + atan( (t-1.5)/(1+1.5t) )
    #      [39/16,INF]   atan(x) = atan(INF) + atan( -1/t )
    #
    #  If isnan(x) is true, then the nan value will eventually be passed to
    #  atan_pq(x) and return the appropriate nan value.
(... implementation follows ...)

Note that Julia’s current implementation isn’t very good. It’s been on my list of functions to improve for a while (see WIP: faster 1 and 2 arg atan methods by oscardssmith · Pull Request #41371 · JuliaLang/julia · GitHub), but I haven’t managed to get it fully working yet.