# Speed of ALL(x==y) for large arrays

Based on the times for the code below, it appears that gfortran -O3 evaluates `all(x==y)` very fast for large arrays and is smart enough to not evaluate the full expressions `x==y` or `sin(x)==sin(y)`. My loop equivalent to `all(x==y)` is slower. Generating random `x(:)` and setting `y(:)` with `merge` takes the most time. I am a little surprised at the speed of `all()`, and I may need to amend a tip I posted.

``````module m
implicit none
private
public :: wp, all_equal
integer, parameter :: wp = kind(1.0d0)
contains
pure function all_equal(x,y) result(x_eq_y)
real(kind=wp), intent(in) :: x(:),y(:)
logical                   :: x_eq_y
integer                   :: i,n
x_eq_y = .false.
n = size(x)
if (size(y) /= n) return
do i=1,n
if (x(i) /= y(i)) return
end do
x_eq_y = .true.
end function all_equal
end module m
!
program xall_equal
use m, only: wp, all_equal
implicit none
integer, parameter :: n = 10**8, niter = 10, nt = 6
real(kind=wp)      :: x(n),y(n),t(nt),dt(nt-1)
integer            :: iter
logical            :: tf_1,tf_2,tf_3
real(kind=wp), parameter :: frac_diff = 1.0d-5
dt = 0.0_wp
do iter=1,niter
call cpu_time(t(1))
call random_number(x)           ; call cpu_time(t(2))
! set y equal to x except for the fraction frac_diff of elements
y = merge(x,2.0_wp,x>frac_diff) ; call cpu_time(t(3))
tf_1 = all_equal(x,y)           ; call cpu_time(t(4))
tf_2 = all(x == y)              ; call cpu_time(t(5))
! see if the compiler fully evaluates sin(x) and sin(y), which would be slow
tf_3 = all(sin(x) == sin(y))    ; call cpu_time(t(6))
if (tf_1 .neqv. tf_2 .or. tf_2 .neqv. tf_3) error stop "tf_1, tf_2, tf_3 should be equal"
dt = dt + t(2:nt) - t(1:nt-1)
end do
print "(/,'times',/,*(a12,1x))","RNG","merge","all_equal","all(x==y)","all_sin"
print "(*(f12.8,1x))",dt
end program xall_equal
``````

output:

``````times
RNG        merge    all_equal    all(x==y)      all_sin
12.75000000   7.51562500   0.00000000   0.01562500   0.10937500
``````
3 Likes

If `all` were really short circuiting, wouldnâ€™t it take the same amount of time as your `all_equal` function?

Modifying the program to run the simulation 10 times
``````module m
implicit none
private
public :: wp, all_equal
integer, parameter :: wp = kind(1.0d0)
contains
pure function all_equal(x,y) result(x_eq_y)
real(kind=wp), intent(in) :: x(:),y(:)
logical                   :: x_eq_y
integer                   :: i,n
x_eq_y = .false.
n = size(x)
if (size(y) /= n) return
do i=1,n
if (x(i) /= y(i)) return
end do
x_eq_y = .true.
end function all_equal
end module m
!
program xall_equal
use m, only: wp, all_equal
implicit none
integer, parameter :: n = 10**8, niter = 10, nt = 6, nruns = 10
real(kind=wp)      :: x(n),y(n),t(nt),dt(nt-1)
integer            :: irun,iter
logical            :: tf_1,tf_2,tf_3
real(kind=wp), parameter :: frac_diff = 1.0d-5
print "(/,'times',/,*(a12,1x))","RNG","merge","all_equal","all(x==y)","all_sin"
do irun=1,nruns
dt = 0.0_wp
do iter=1,niter
call cpu_time(t(1))
call random_number(x)           ; call cpu_time(t(2))
! set y equal to x except for the fraction frac_diff of elements
y = merge(x,2.0_wp,x>frac_diff) ; call cpu_time(t(3))
tf_1 = all_equal(x,y)           ; call cpu_time(t(4))
tf_2 = all(x == y)              ; call cpu_time(t(5))
! see if the compiler fully evaluates sin(x) and sin(y), which would be slow
tf_3 = all(sin(x) == sin(y))    ; call cpu_time(t(6))
if (tf_1 .neqv. tf_2 .or. tf_2 .neqv. tf_3) error stop "tf_1, tf_2, tf_3 should be equal"
dt = dt + t(2:nt) - t(1:nt-1)
end do
print "(*(f12.8,1x))",dt
end do
end program xall_equal
``````

both `all` and `all_equal` run too fast to reliably measure their execution times, as shown below. Since `all_equal` is short-circuiting, I think `all` must be too.

``````times
RNG        merge    all_equal    all(x==y)      all_sin
10.45312500   2.81250000   0.00000000   0.00000000   0.09375000
9.67187500   2.31250000   0.00000000   0.00000000   0.10937500
9.67187500   2.39062500   0.00000000   0.00000000   0.12500000
9.64062500   2.29687500   0.00000000   0.01562500   0.14062500
9.76562500   2.39062500   0.00000000   0.00000000   0.07812500
9.64062500   2.32812500   0.01562500   0.00000000   0.03125000
9.65625000   2.34375000   0.00000000   0.00000000   0.14062500
9.84375000   2.37500000   0.00000000   0.00000000   0.10937500
9.70312500   2.34375000   0.00000000   0.00000000   0.07812500
9.67187500   2.32812500   0.00000000   0.00000000   0.10937500
``````
2 Likes

Instead of using `merge`, which can cause the short-circuiting to occur after only a few elements, it may be better to change only a single value.

``````      ! set y equal to x except for one value
y = x
y(max(int(n * x(1)),n/2)) = 2

times
RNG        merge    all_equal    all(x==y)      all_sin
1.05091300   0.63109800   0.19972300   0.19834900   4.15434400
``````

So, likely a negligible difference between your function and `all`. You can prove this with the -fdump-tree-optimized option.

1 Like

Shouldnâ€™t we expect the timing to be quite dependent on the distribution of the data values and the specifics of the compiler and its runtime?

I put in @karglâ€™s suggested change, and added

``````   tf_4 = .not.any(x /= y)         ; call cpu_time(t(7))
``````

and obtained the following timings on an NUC with an i7-10710:

``````           RNG          merge      all_equal      all(x==y)        all_sin .not.any(x/=y)
3.75100000     1.03100000     0.46800000     0.44000000     5.37400000     0.37300000
``````

[side comment: The forum formatter seems to regard program text following a semicolon in code as a comment â€“ as it is in some assembly languages]

``````(base) urbanjs@venus:/tmp\$ ifort -O3 tmp2.f90
(base) urbanjs@venus:/tmp\$ ./a.out

times
RNG        merge    all_equal    all(x==y)      all_sin
36.04356200   1.48177000   0.00139300   1.09523400  16.40145700

(base) urbanjs@venus:/tmp\$ nvfortran -O3 tmp2.f90
(base) urbanjs@venus:/tmp\$ ./a.out
gfortran -O3 tmp2.f90
./a.out

times
RNG        merge    all_equal    all(x==y)      all_sin
2.64983916   1.87623382   0.00148511   1.44972777  20.16094804

(base) urbanjs@venus:/tmp\$ gfortran -O3 tmp2.f90
(base) urbanjs@venus:/tmp\$ ./a.out

times
RNG        merge    all_equal    all(x==y)      all_sin
7.74313400   1.88132500   0.00248500   0.00233300   0.04209800
``````

Why Fortranners are notorious for rolling their own. Nothing in the standard says one way or another when and where a short-circuit will occur, but ANY and ALL take any type and order (so I use them a lot when intelligibility and convenience make it justified).

Optimization is still very platform-specific although it gets better all the time. And the differences in the PRNG might be a quality issue as well (trading speed for quality) so once you are into optimizing taking anything for granted is a risk, and profiling the code (which is also part art) becomes more important, as what is no problem in one PE (programming environment) might be in another, or flipping a compiler switch can change everything.

PS: short-circuiting is almost certainly the biggest cause in the time differences, and there are times when memory use would be more important, but notice that when they are the same the fastest time above for the sin(x) == sin(y) comparison becomes the slowest; so if that was the common case you would see totally different issues

``````         RNG        merge    all_equal    all(x==y)      all_sin
7.81182000   1.48567100   1.50350300   1.61926500  29.71101300 gfortran
36.24548100   1.42185800   1.04758700   1.09871900  16.39151000 ifort
2.65723276   1.48142457   1.57551217   1.46997046  21.24921322 nvfortran
``````