Size of long array

@fortran4r , it may take ages, it at all, for something to get worked out here in terms of the language standard. In the meantime, consider again whether the KIND optional argument is something you can use:

function example(x) result(res)
  integer(I8) :: i  
  real(RK) :: x(:), res
  res = 0.0_rk
  do i = 1, size(x, kind=kind(i))
2 Likes

I am OK with using the optional kind argument. Since I regularly work with very long arrays, does that mean all intrinsic functions that return integer(4) by default should have a kind = 8 argument?

Could you please explain in more details why the sum function needs an optional kind argument? I don’t get it.

I haven’t used one myself, but I read recently about the Cerebras Architecture for Extreme AI workloads including neural networks with up to 120 trilion parameters (1.2 × 10^14), which is the same order as 2^48 → 2.81 × 10^14.

Fortran is nowhere to be found within the Cerebras SDK Technical Overview white paper. Users can choose between Python frameworks such as PyTorch and Tensorflow, or use the Cerebras Software Language, which is close to C/C++ but also includes features from Zig, Go, Swift, Scala, and Rust.


Personally, I second @FedericoPerini’s reply. If you know you are dealing with an address space that exceeds the de facto default integer on your platform, use a kind parameter.

Since @fortran4r has referred to R in several of his threads, I would also bring up the relevant section in R Internals, Section 12.1 Long Vectors:

  • Because a fair amount of code in R uses a signed type for the length, the ‘long length’ is recorded using the signed C99 type ptrdiff_t, which is typedef-ed to R_xlen_t.
  • These can in theory have 63-bit lengths, but note that current 64-bit OSes do not even theoretically offer 64-bit address spaces and there is currently a 52-bit limit (which exceeds the theoretical limit of current OSes and ensures that such lengths can be stored exactly in doubles).
  • The serialization format has been changed to accommodate longer lengths, but vectors of lengths up to 2^31-1 are stored in the same way as before. Longer vectors have their length field set to -1 and followed by two 32-bit fields giving the upper and lower 32-bits of the actual length. There is currently a sanity check which limits lengths to 2^48 on unserialization.

If I were to write an R package, I would personally use something like:

module rtypes
use, intrinsic :: iso_c_binding, only: c_ptrdiff_t, &
   rsexp => c_ptr  ! SEXP
implicit none
private

public :: rxlen, rint, rreal, rsexp

integer, parameter :: rxlen = c_ptrdiff_t   ! R_xlen_t, long length type
integer, parameter :: rint  = c_int         ! INTSXP data type
integer, parameter :: rreal = c_double      ! REALSXP data type

end module

Using these kind parameters would make the storage types unambiguous and resilient to changes in Fortran processors, compilers flags, and even competing R implementations (if they were to ever appear).

In light of the fact the original poster has only indicated interested in a particular combination, CRAN R + gfortran, he can of course neglect such advice and stick to magic numbers (4, 8).

@fortran4r ,

You are to be commended greatly for paying such close attention to good coding practices.

Toward the same, please first consider a “kinds” module where, in one place, you define all the KINDs of intrinsic types you think you will have to employ in your codes. So in this module, you would do well to have integer kinds of various ranges you would need in conjunction is what is supported by your compiler. This can be either via SELECTED_INT_KIND or named constants in ISO_FORTRAN_ENV such as int32, int64 etc. The same with kinds of floating-point types, though here my suggestion will be IEEE_SELECTED_REAL_KIND than anything else, assuming your situation is the same as nearly 100% of computing now which is using IEEE floating-point arithmetic.

That is, try not to use the hard-wired kinds such as 4 and 8, use named constants toward your defined kinds.

Then, since you write you “regularly work with very long arrays,” you would need to close pay attention to your choice of integer kinds and the use of the optional KIND argument wherever you handle the size and shape of your arrays and any indexing toward the same e.g, your do i = 1, size(x .. statement. Fortran places the onus greatly on the practitioner - you - in such matters, as you can see from this thread and the others like it.

2 Likes

I also noticed that the shape function does not have an optional kind argument. This seems weird since the size function has it.

Starting Fortran 2003, SHAPE intrinsic supports the optional KIND parameter

P.S.> This comment edited to correct my earlier misstatement where I mentioned 202X revision instead of 2003.

Consider following example:

program sumkind
  integer :: max=100000, i, j
  real, allocatable :: t(:)
  real(kind(1d0)) :: mysum
  do i=1,5
    allocate(t(max))
    call random_number(t)
    mysum = 0d0
    do j=1, max
      mysum = mysum+t(j)
    end do
    print *, max, sum(t), mysum, 0.5*max
    deallocate(t)
    max = max*10
  end do
end program sumkind
     size(t)    sum(t)         manually summed in DP    expected value
      100000   49907.0898       49907.149867594242        50000.0000
     1000000   499942.594       499954.04933792353        500000.000
    10000000   4999431.00       4999227.4765974879        5000000.00
   100000000   16777216.0 bad!  49999481.277478814        50000000.0
  1000000000   16777216.0 bad!  500001528.13787073        500000000.
1 Like

My bad. The shape function does have an optional kind argument. I was reading an outdated doc.

It seems F2018 already has it.

page 337.

GFortran also has it.

page 260.

1 Like

Thanks for pointing this out. Do functions like maxval and minval also have this issue?

Why would they? They don’t involve any (floating point) arithmetic operations.

So sum is the only intrinsic array function affected by this issue?

I modified your example, and the issue is gone. It seems sum does not have the issue you mentioned before.

program sumkind
  integer(8) :: max = 100000, i, j
  real(8), allocatable :: t(:)
  real(8) :: mysum
  do i = 1, 5
    allocate(t(max))
    call random_number(t)
    mysum = 0d0
    do j = 1, max
      mysum = mysum + t(j)
    end do
    print *, max, sum(t), mysum, 5d-1*max
    deallocate(t)
    max = max*10
  end do
end program

It would affect any array intrinsics which need to perform floating-point arithmetic including norm2, product, dot_product, matmul. See Roundoff error caused by floating-point arithmetic | Wikipedia.

Is it gone? How about you set mysum to quad-precision?

If you’re aware that round-off errors due to finite precision are a problem, you can still use sum but you need to convert the array beforehand, i.e. sum(real(t,sk)) where sk > kind(t).

Take the following code as an example:

module prec
   integer, parameter :: sp = kind(1.0)
   integer, parameter :: dp = kind(1.0d0)
end module

subroutine do_sum(n,a,sa,da)
   use prec
   integer, intent(in) :: n
   real(sp), intent(in) :: a(n)
   real(sp), intent(out) :: sa
   real(dp), intent(out) :: da
   sa = sum(a)
   da = sum(real(a,kind(da)))
end subroutine

Here’s the resulting assembly listing, when compiled with x86-64 gfortran 12.1 and -Os flag (I’m using this to keep the assembly listing readable):

do_sum_:
        movsx   rdi, DWORD PTR [rdi]
        xor     eax, eax
        xorps   xmm0, xmm0
.L3:
        inc     rax
        cmp     rdi, rax
        jl      .L2
        addss   xmm0, DWORD PTR [rsi-4+rax*4]
        jmp     .L3
.L2:
        movss   DWORD PTR [rdx], xmm0
        xor     eax, eax
        xorps   xmm0, xmm0
.L5:
        inc     rax
        cmp     rdi, rax
        jl      .L4
        cvtss2sd        xmm1, DWORD PTR [rsi-4+rax*4]
        addsd   xmm0, xmm1
        jmp     .L5
.L4:
        movsd   QWORD PTR [rcx], xmm0
        ret

The loop between L3 and L2 is the single-precision sum. The loop between L5 and L4 is the double-precision sum. Note the additional cvtss2sd instruction which converts a single-precision operand to double precision.

If you replace the sum(real(a,kind(da)) with an explicit loop,

    da = 0.0_dp
    do i = 1, n
        da = da + a(i)  ! automatic conversion
    end do

you get the exact same assembly instructions. No array temporary is created.

2 Likes

We are discussing the integer(8) overflow issue instead of rounding errors.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52053

There is no integer overflow in this example. The largest array (1000000000) is still within the range of default integer (2147483647).

Edit: As Dominique d’Humieres indicated in the GCC bugzilla, it’s simply due to summing values of different magnitude. You can see it in NumPy:too:

>>> import numpy as np
>>> a = np.array([1.6777216e7,1.],np.float32)
>>> np.sum(a)
16777216.0

NumPy however provides

dtype : dtype, optional
The type of the returned array and of the accumulator in which the
elements are summed. The dtype of a is used by default unless a
has an integer dtype of less precision than the default platform
integer. In that case, if a is signed then the platform integer
is used while if a is unsigned then an unsigned integer of the
same precision as the platform integer is used.

This gives the expected output:

>>> np.sum(a,dtype=np.float64)
16777217.0

I imagine what @msz59 was suggesting, was to offer a “higher” precision accumulator, similar to NumPy. You could consider it syntactic sugar for the existing approach with an explicit conversion using the real intrinsic.


Edit2: before anyone begins to question the quality of gfortran for something which is an issue of finite-precision arithmetic, ifx and ifort suffer from the same problem:

$ ifx -warn all -O3 sumkind.f90 
$ ./a.out
      100000   50095.35       50095.5339293266        50000.0000000000     
     1000000   499784.1       499801.279849011        500000.000000000     
    10000000   5000416.       5000626.31909301        5000000.00000000     
   100000000  1.6777216E+07   49997782.4513367        50000000.0000000     
  1000000000  1.6777216E+07   499995740.004367        500000000.000000     
$ ifort -warn all -O3 sumkind.f90 
$ ./a.out
      100000   50095.56       50095.5339293266        50000.0000000000     
     1000000   499800.8       499801.279849011        500000.000000000     
    10000000   5000632.       5000626.31909298        5000000.00000000     
   100000000  4.9997556E+07   49997782.4513326        50000000.0000000     
  1000000000  1.3421773E+08   499995740.003895        500000000.000000  

Curiously, ifort appears to use a different summation algorithm, while the random number generators appear to be the same.

exactly so :slight_smile: My example actually shows two problems with summing a huge number of small values (random numbers are between 0 and 1). The first, obvious is roundoff error and the second is the finite density of real values in the computer representation. In 4-bytes reals, above the magic value of 16777216, is it impossible to add a number less than one and get the result any different. Using higher precision accumulator helps with that. But, I must say, @ivanpribec’s trick with sum(real(t,kind(1d0))) is very smart. I would not guess that there would be no temporary array built (which would be a killer for GB-sized array)

Edit: still, smart as it is, the trick depends fully on the smartness of compiler creators. It is by no means guaranteed. An optional kind parameter of sum could guarantee that.

1 Like

Indeed, there is no guarantee. But in practice it seems to work fine.

Apart from higher precision, one can also use a different summation algorithm. E.g. Python offers both math.sum (naive) and math.fsum which tracks intermediate partial sums. In Julia one can use KahanSummation.jl.

In Fortran, using Kahan summation requires some care with respect to the compiler flags. I’ve tried to roll my own implementation (thanks to an anonymous reader for the pseudo-code):

!----------------
! kahan_sum.F90
!----------------
module kahan_sum

  use, intrinsic :: iso_fortran_env, only: int64
  implicit none
  private

  public :: sum

  integer, parameter :: sp = kind(1.0e0)
  integer, parameter :: ip = int64

contains

  !> Kahan sum algorithm
  !> https://en.wikipedia.org/wiki/Kahan_summation_algorithm#The_algorithm
#ifdef __GFORTRAN__
  pure function sum(a) result(accum)
     real(sp), intent(in) :: a(:)
#else
  impure function sum(a) result(y)
     real(sp), intent(in), target :: a(..)
     real(sp):: y

     real(sp), pointer :: aflat(:) => null()
     integer(ip) :: nelem, dims(rank(a))

     dims = shape(a,kind=ip)
     nelem = product(dims)

     select rank(a)
        rank(0)
           y = a
        rank(1)
           aflat(1:nelem) => a(1:nelem)
           y = sum_kernel(aflat)
        rank(2)
           aflat(1:nelem) => a(1:dims(1),1:dims(2))
           y = sum_kernel(aflat)
        rank(3)
           aflat(1:nelem) => a(1:dims(1),1:dims(2),1:dims(3))
           y = sum_kernel(aflat)
        rank default
           error stop '[sum] Ranks > 3 are not supported.'
     end select

  contains
#endif

#ifndef __GFORTRAN__
     pure function sum_kernel(a) result(accum)
        real(sp), intent(in) :: a(:)
#endif
        real(sp) :: corr, accum, t, y
        integer(ip) :: i

        corr  = 0.0_sp
        accum = 0.0_sp
        do i = 1, size(a,kind=ip)
           y = a(i) - corr
           t = accum + y
           corr = (t - accum) - y
           accum = t
        end do

#ifndef __GFORTRAN__
     end function
#endif

  end function

end module
!----------------
! kahan_main.f90
!----------------
program kahan_main

   use kahan_sum, only: ksum => sum
   implicit none

   integer :: nmax = 1000000000
   real, allocatable :: t(:)

   allocate(t(nmax))
   call random_number(t)

   print *, sum(t), sum(real(t,kind(1.0d0))), ksum(t)

   deallocate(t)

end program

I was only able to test the assumed-rank version with Intel compilers, but it’s not difficult to call the sum kernel directly in gfortran. Here’s the output for different compiler flags:

$ ifort -warn all -O0 kahan_sum.F90 kahan_main.f90 
$ ./a.out
  1.6777216E+07   499994194.168867       4.9999421E+08
$ ifort -warn all -O3 kahan_sum.F90 kahan_main.f90
$ ./a.out
  1.3421773E+08   499994194.168732       4.9999421E+08
$ ifx -warn all -O0 kahan_sum.F90 kahan_main.f90
$ ./a.out
  1.6777216E+07   499994194.168867       4.9999421E+08
$ ifx -warn all -O3 kahan_sum.F90 kahan_main.f90
$ ./a.out
  1.6777216E+07   499994194.169164       1.6777216E+07
$ gfortran -Wall -Wno-intrinsic-shadow -Os kahan_sum.F90 kahan_main.f90
$ ./a.out
   16777216.0       500007079.09539169        500007072.   
$ gfortran -Wall -Wno-intrinsic-shadow -O2 kahan_sum.F90 kahan_main.f90
$ ./a.out
  16777216.0       500004235.12119830        500004224.   
$ gfortran -Wall -Wno-intrinsic-shadow -Ofast kahan_sum.F90 kahan_main.f90
$ ./a.out
   67108864.0       500003566.39887440        67108864.0  

Here you can see how an aggresive option such as -Ofast “ruins” the algorithm.

In my opinion, what this thread is showing is that the selection of 32-bit integer and real as default kinds for 64-bit Fortran has been a poor choice. The behaviour of SIZE and SUM demostrate this.
64-bit Fortran with 64-bit default kinds will come eventually, hopefully with hardware support for the then 128-bit double precision.
There will be some pain with this transition. I remember the lingering transitions in 32-bit with “Huge” arrays larger than 64 KBytes.

I was surprised to read (on my search through Wikipedia) that when HP (what HP?) took over Cray in 2019 that they changed 64-bit Fortran from 64-bit default integer to 32-bit. I am not sure if that is correctly reported, but it certainly surprises me.