Loops don't behave like they should

I found a way to “break” do-loops.
Consider this example, which approximates a loop from 1 to infinity:

integer :: i
do i=1, huge(i)
! ...

The do-loop should calculate the iteration count, decrease it until it reaches zero and then exit the loop (11.1.7.4.5 j3).
This shouldn’t be a problem, because the iteration count starts at huge(i) which is the largest possible integer of the given type. It is decreased until it reaches zero, so there cannot be any over- or underflow.
However with gfortran, ifort and nagfor such a loop never exits. Only with pgfortran it terminates.
To round things up, I tested another loop, which goes from -1 till huge(i)-1. Here gfortran, ifort and nagfor had no problem, although an iteration count with the same kind like i would overflow (11.1.7.4.1(3) m2 - m1). In contrast pgfortran immediately stopped the loop.
Therefore I assume that pgfortran is the only compiler (of the four I tested) which behaves exactly like the specification defines a do-loop.
Interestingly the assembly of the gfortran reveals, that there is an unconditional jump (jmp). So it isn’t even an overflow, but the compiler translates do i=1,huge(i) to an endless loop:

assembly
# a.f90:7: do i=1,huge(i)
        movl    $1, -4(%rbp)    #, i
.L4:
# a.f90:7: do i=1,huge(i)
        movl    -4(%rbp), %eax  # i, i.5_4
        addl    $1, %eax        #, _5
        movl    %eax, -4(%rbp)  # _5, i
        jmp     .L4     #
        .cfi_endproc
test program
integer :: i

do i=-1,huge(i)-1
end do
print "(a,i0)", "'do i=-1,huge(i)-1' exits with i=", i

do i=1,huge(i)
end do
print"(a,i0)", "'do i=1,huge(i)' exits with i=", i

end
Output

Output (gfortran, ifort, nagfor):
'do i=-1,huge(i)-1' exits with i=2147483647
second loop does not exit

Output(pgfortran):
'do i=-1,huge(i)-1' exits with i=-1
'do i=1,huge(i)' exits with i=-2147483648

Compiler Versions

gcc version 7.4.0 & 11.2.0
ifort version 19.0.2.187
nag 7.0
pgfortran 20.1-0

2 Likes

In fortran i would be increased one more after the end of loop. Maybe the compilers detected possible overflow?

I have already encountered the problem and I noted in my code:

    ! INTEGER(2) ranges from -32768 to 32767
    !***********************************
    ! We must be careful because the following loop is undefined
    ! in the Fortran Standard:
    ! do a = 0, 32767
    !                 1
    ! Warning: DO loop at (1) is undefined as it overflows [-Wundefined-do-loop]
    ! With some compilers, it will run OK, with others a will 
    ! become <0 and the loop will never end...
    !***********************************
    errors = 0
    do a = 0, 32766

Source: https://github.com/vmagnin/gtk-fortran/blob/gtk4/examples/tests.f90

Your description of the way the loop count is established is not in agreement with what the standard says.

The iteration count is established and is the value of the of the expression (m2 −m1 +m3)/m3, unless that value is negative, in which case the iteration count is 0.

In your case, you set m1 = -1, m2 = HUGE-1, and m3 = 1, so calculation of (m2 - m1 + m3) results in overflow.

4 Likes

Yes, in the Fortran 2018 standard, the interesting pages are 182-184:

You can exit when (i == huge(i)) to avoid an infinite loop. For the code

program loop
implicit none
integer :: i

do i=-1,huge(i)-1
end do
print "(a,i0)", "'do i=-1,huge(i)-1' exits with i=", i

do i=1,huge(i)
   if (i == huge(i)) exit
end do
print"(a,i0)", "'do i=1,huge(i)' exits with i=", i

end program loop

gfortran, Intel Fortran, and g95 give output

'do i=-1,huge(i)-1' exits with i=2147483647
'do i=1,huge(i)' exits with i=2147483647

Flang (clang version 7.0.1) gives

'do i=-1,huge(i)-1' exits with i=-1
'do i=1,huge(i)' exits with i=2147483647

which does not look right.

Apparently the 3 compilers treat loops the “traditional” way: add 1 to i then see if it exceeds the upper limit. So the first loop exits with i=huge(i), and the second loop never exits because it adds 1 to i when i=huge(i) then checks if the upper limit was reached. All the above is what I expected the compilers to do for decades. At least in gfortran 11.2, the second loop is actually interrupted with a quite informative runtime error.

However, gfortran 5.5.0 (lucky me I still have a computer with it) actually exits the second loop as in pgfortran:
'do i=-1,huge(i)-1' exits with i=2147483647
'do i=1,huge(i)' exits with i=-2147483648
So first loop behaves as in gfortran 11.2 but the second behaves as in pgfortan. Weird.

There is an ambiguity here as huge is not declared as an array variable, but is used as an iterated function call. There is alternative behavior if using huge instead of huge(i).

The second example does not conform to the standard since it requires huge()+1 iterations, which is not representable by the kind of i, so anything is permissible. But the first example would seem to depend on how the iteration count is computed.

The final result is huge(), but we still have to compute it, and I don’t think the standard says that it must be evaluated left-to-right. The intermediate value in (huge()-1)+1 is representable by the kind of i, but the one in (huge()+1)-1 is not.

I know that GFortran has many tests to handle huge() in other cases, like this:

do i = HUGE(i) - 10, HUGE(i), 2
! …
end do

which would fall in the same category, but it still begs the question.

Whenever this comes up in comp.lang.fortran or the GFortran mailing list, someone will usually state that the standard requires m2+1 be representable, but I cannot find anything.

I think these boundary cases are always interesting discussions because they show how seemingly abstract language issues can have practical consequences.

In your code, the do i=1,huge(i) loop cannot execute correctly because the loop index must be set to huge(i)+1 upon loop exit. That results in overflow. Furthermore, as far as I know, there is no language requirement that the trip count expression must be evaluated left to right. So the trip count expression (huge(i)-1+1) itself could overflow if it were evaluated as ((huge(i)+1)-1). Something like that might happen if the compiler writer wanted to use an increment instruction rather than an integer addition instruction to save a few clock cycles.

The do i=-1,huge(i)-1 loop avoids the loop index overflow problem because upon exit the loop index values should be huge(i), which is no problem. However, the trip count expression overflows, ((huge(i)-1)-(-1)+1), no matter how it is evaluated. So this loop cannot be executed correctly either.

A do loop expression such as do i=0,huge(i)-1, seems like it should execute correctly. The trip count is ((huge(i)-1)-0+1), which seems like it should evaluate correctly no matter the order of the operations within, and the loop index upon exit is huge(i), which is no problem. I think this is all that can be expected of fortran do loops, given the constraints of the maximum trip count value and the maximum final loop index value.

I have always wondered why the language in the standard is not more specific on how the trip count expression (m2-m1+m3)/m3 should be evaluated. It seems like (m2+(m3-m1))/m3 would avoid most of these overflow issues, at least when they can be avoided, but even that expression can have problems, so maybe there is simply no easy answer to this problem.

Finally, one must be careful using do loops with no body as benchmarks or regression tests. Compilers are free to optimize away the entire loop, regardless of the m1, m2, and m3 values.

1 Like
ian@ian-HP-Stream-Laptop-11-y0XX:~$ cat functr1.f
        real  sin(2)
        sin(1) = 90.000
        sin(2) = 6.000
        print *, sin(1)
        print *, sin(2)
        end program
ian@ian-HP-Stream-Laptop-11-y0XX:~$ gfortran functr1.f -o functr1
ian@ian-HP-Stream-Laptop-11-y0XX:~$ ./functr1
   90.0000000    
   6.00000000

Please note that sin(90) can be an intrinsic function call or an array variable.

Yes, it depends on the compiler and on the version of the compiler. In this Issue #129 I had noted that the behaviour of GFortran changed somewhere between versions 6.5.0 and 7.3.0. And a warning message was added.

This has been a well discussed feature of a Fortran DO loop.
I would wonder why you need a loop of this number of itterations.
“DO” or “DO WHILE …” are better approaches if the number of itterations are unknown.

What do you expect the value of “i” to be on exit from your loop ?

“do i = 1,huge(i)-1” would overcome the problem, but this would not cycle for the number of itterations you are requesting, assuming you need that many itterations.

It was an example from a Fortran lecturer. If you have an endless, converging series, it is very natural to approximate it up to an “huge” number. Since we have such a function, huge(i) is one of the first things I would come up with. huge(i) - 1 will not make any difference, because depending of the series the elements are close to zero. But if you want to compute up to an huge number, why would you decrement it?
Moreover, my lecturer had the do definition with the trip count on his mind, and therefore didn’t see that coming. We only realised this could be problematic because I actually tested the example code.
I see why the compiler will not calculate the trip count, to save some cycles. But then it would maybe make sense to change the specification.

Consider the Gregory series for atan(1) = pi/4. The n-th term (sign ignored) is 1/(2n-1). When n is large enough that this term is less than (pi/4) X machine_epsilon, adding more terms ceases to change the accumulated sum. You might as well stop at that juncture.

One solution would be to count backwards, when small numbers are added to a cumulative sum which is also small to start with. This solves the problem that I mentioned above, but it also has a problem. Where do you start? At n= huge_int? Is that really required?

Read about Kahan summation.

3 Likes

I am really confused. I’ve always thought that, in the absence of parentheses, adjacent operations of the same priority are evaluated left-to-right with the exception of exponentiation. This seems to be confirmed in the Note 10.9 (section 10.1.3). And then in the aforementioned 10.1.5.2.4 there is the Note 10.18 which shows allowable alternative forms that can be used by the processor in the evaluation of those expressions. The list contains things like

expression: X + Y + Z allowable alternative: X + (Y + Z)

which clearly contradicts the above rule, unless I am misinterpreting it very seriously.

NOTE 10.9
In interpreting a level-2-expr containing two or more binary operators + or –, each operand (add-operand) is combined from left to right. Similarly, the same left-to-right interpretation for a mult-operand in addoperand, as well as for other kinds of expressions, is a consequence of the general form. However, for interpreting a mult-operand expression when two or more exponentiation operators ** combine level-1-expr operands, each level-1-expr is combined from right to left.
For example, the expressions
2.1 + 3.4 + 4.9
2.1 * 3.4 * 4.9
2.1 / 3.4 / 4.9
2 ** 3 ** 4
’AB’ // ’CD’ // ’EF’
have the same interpretations as the expressions
(2.1 + 3.4) + 4.9
(2.1 * 3.4) * 4.9
(2.1 / 3.4) / 4.9
2 ** (3 ** 4)
(’AB’ // ’CD’) // ’EF’

What you’ve found is an “operator error” scenario where the processor(s) are not required to diagnose and report the error but where the situation can be highly uncomfortable because it screams “RTFM”!

This is a bit worrying, but we all make mistakes.

Your citation describes the interpretation of the expression. There is a followup stage where alternative expressions are permitted (10.1.5.2.4):

Once the interpretation of a numeric intrinsic operation is established, the processor may evaluate any mathematically equivalent expression, provided that the integrity of parentheses is not violated.

Two expressions of a numeric type are mathematically equivalent if, for all possible values of their primaries, their mathematical values are equal. However, mathematically equivalent expressions of numeric type can produce different computational results.

The subsequent note provides explicit examples.

1 Like