Size of long array

I work with very long arrays. I noticed that when an array’s size is beyond the limit of integer(4), size(x) will stop working on GFortran.

program size_test
  integer(8) :: n
  real(8), allocatable :: x(:)
  n = 3d9
  allocate(x(n))
  print *, size(x)
end program

Size has an optional kind argument. You could try that.

i.e.

program size_test
  integer(8) :: n
  real(8), allocatable :: x(:)
  n = 3d9
  allocate(x(n))
  print *, size(x, kind=kind(n))
end program
2 Likes

According to the Fortran best practice docs, the preferred way to pass arrays to procedures is as assumed-shape arrays. Since the size function cannot figure out whether the resulted integer may overflow, does that mean using assumed-shaped arrays with the size function is dangerous?

I found a similar discussion about this issue.
https://groups.google.com/g/comp.lang.fortran/c/szEW0W6Ekjg

function example(x) result(res)
  integer(8) :: i  
  real(8) :: x(:), res
  res = 0d0
  do i = 1, size(x)
    res = res + x(i)  
  end do
end function

If I pass a long array to this function, the result will be zero because size(x) will be negative due to integer overflow. If the size function can automatically return an integer(8), then we don’t have to worry about that.

It is just ridiculous that on a 64-bit OS and using a 64-bit Fotran compiler that the intrinsic SIZE (X) does not return the correct answer.
To be told you are wrong, because you did not read some obscure clause in the latest Fortran Standard should not be acceptable.
This is a total failure in the functionality of the SIZE intrinsic; a behaviour that is not fit for purpose.

Who is modern Fortran being developed for; Fortran users or standard writers ?
What should infrequent users of Fortran, such as Julia or Python users make of this result ?

{Please do not edit this post, as these are valid questions that should be discussed.}

5 Likes

I think the real problem is that all the 2**32 bit patterns are valid default integers so that there is no integer NaN or Inf, because if an array bigger than huge(1) had been declared and its size asked for as a default integer then Inf would have been a good result.

I would be happy to lose two of the integers of each kind near +huge and -huge to be +Inf, -Inf and quiet and signalling NaN, but some Fortran users might not be.

1 Like

Actually, there is one intrinsic lacking the kind optional parameter, which would IMHO benefit having it. namely, SUM (ARRAY, DIM [, MASK]) or SUM (ARRAY [, MASK]). For big arrays, it could be valuable to compute the sum in bigger precision and/or range type. Surely there are obstacles, kind values not being uniquely assigned to types (e.g. integer/real) but that could probably be solved in some smart way.

1 Like

You seem to be suggesting that in a 64-bit program (meaning addresses are 64 bits; at the hardware level, physical addresses may actually be 48 bits) the SIZE intrinsic should return a 4-byte integer for arrays whose size is 2^31-1 or less, and should return an 8-byte integer for arrays whose size is larger. Such behavior may be “correct” from your perspective, but may create lots of problems. If you wrote

sx = size(x)

you would need to make sx an 8-byte integer if there is any chance that you will make x a huge array. Your Fortran source, run on a machine with, say, 8 GB of RAM, or with a 32-bit OS, would suffer a big overhead because of all those 8-byte integers.

Some intrinsic functions have fixed (default) kinds for their return values when a KIND= optional argument is not present. Other intrinsic functions have a return variable whose KIND is adopted from the argument(s). Here is another example of the first kind (for a compiler with kind numbers same as size in bytes):

program main
    real(8) :: r
    complex(8) :: c
    r = 0.1d0
    c = cmplx(r,2*r)      ! both arguments are REAL*8
    print '(2F20.16)',c
    c = cmplx(r,2*r,kind=8)
    print '(2F20.16)',c
end

It has been a long time since generic intrinsics were identified as a useful compiler capability to cover different KIND types.

For a 64-bit Fortran compiler, it is just “head in the sand” stuf to pretend that arrays can only be smaller than 2^31 entries or give an answer that SIZE (X) could be negative. ( Do the people who make this stuff up actually use Fortran ?)

When can it ever be correct that SIZE (X) is -ve ?

No, I am suggesting that for a 64-bit Fortran compiler SIZE should behave as LOC ( and LOC should by now be an accepted generic function )

The 64-bit compiler should be smart enough to :
a) provide functionality that is consistent with the memory address it supports and
b) provide warnings if SIZE appears as an argument to a function/subroutine and the INTERFACE can not be validated.

Do users recall the mess that was LOC with 32-bit Fortran compilers that could utilise a 3-Gbyte memory space. Totally unnecessary.

To avoid having size, minloc, findloc etc. return bad results for very large arrays you can supply the kind argument, as discussed, or use a compiler option that set the default integer to 64 bits. An example is below. Is it considered poor practice to use a compiler options such as
gfortran -fdefault-integer-8 ?

program size_test
use iso_fortran_env, only: int64
implicit none
integer :: stat
integer(kind=int64) :: n
real, allocatable :: x(:)
n = 3d9
allocate(x(n))
print "(*(1x,i0))",n,size(x,kind=int64),size(x),huge(1)
end program
! Output on Windows with gfortran and Intel Fortran default options:
! 3000000000 3000000000 -1294967296 2147483647
! Output with gfortran -fdefault-integer-8 or ifort /integer-size:64:
! 3000000000 3000000000 3000000000 9223372036854775807
1 Like

There are several things to unwind in this post.

First, size() does not return the hardware address, c_loc() does that. So even if the hardware address is 64-bits, size() simply returns the number of elements in the array, which is unrelated. In particular, the size of an array might fit in a default integer even on a 64-bit address machine, and also the size of an array might exceed a default integer even when it is indexed by default integers. That last situation can occur for several reasons. First, the intrinsic entities can occupy several hardware addresses. On a byte-addressable machine, a default real might occupy four bytes each, and an extended precision real kind might occupy 16 bytes. Each element of a complex array might occupy up to 32 bytes. So when size returns, say 1000000000 (that’s supposed to be 1e9), which easily fits in a four-byte default integer kind, the actual hardware addresses would be 32 times larger (at least 5 extra bits), exceeding the range of a default integer. Then if you consider arrays of user-defined types, the hardware addresses can be much larger, while size() might still fit comfortably within a default integer. Then going further, if you consider multidimensional array, now the array might be indexed with default integers, even small ones, while size() might require an extended integer kind just to return the correct value.

Then there is also the general principle in fortran that the kinds of entities in expressions are all known at compile time. Even the generic intrinsic functions that are user written are like this. The compiler knows at compile time which specific function to invoke based on the type/kind/rank of the arguments. Your suggestion that size() return different kinds, not based on the TKR of its arguments but rather on the value, is inconsistent with that. Consider also a situation where size() is used in an expression. How could a compiler know how to evaluate the mixed-kind expression, with its mixed-kind promotion rules, without knowing the kinds of its primitives?

As for whether size() should throw an exception when its value exceeds the range of its kind, that might be a good idea.

3 Likes

The standard is silent about implementation details for data representation. 7.4.3.1: Integer type reads […] The processor shall provide one or more representation methods that define sets of values for data of type integer. […] The type specifier for the integer type uses the keyword INTEGER.

Making 32bit integers the default (and even supporting them) is a decision of the compiler vendors.

So the question to ask is: Why is the default integer in all compilers that I’m aware of 32bit even on 64bit operating systems?

The default integer is typically 32-bits for several reasons. One is practical reasons, a 32-bit integer occupies half the space and requires half of the memory access of a 64-bit integer. Another reason is backward compatibility with existing code; people usually want their code to do the same thing whether they are using a 32-bit or a 64-bit compiler. And also, the fortran standard requires the default integer, default, real, and default logical to occupy the same storage size. Default complex is twice that. If any one of those were to change, then all of them would need to change together.

3 Likes

@RonShepard,

I am NOT suggesting that for 64-bit Fortran, if the SIZE(ARRAY) result is < 2^31 then return a 4-byte integer.
I AM suggesting that since SIZE (ARRAY) can exceed HUGE(Int32), it should always providing an Int64 result.

The comparison I was using with LOC is that 32-bit Fortran provides a KIND(int32) result, while 64-bit Fortran provides a KIND(int64) result; all very known, provide a result kind that can report the known range.
I can’t recall what 16-bit Fortran did, but I don’t recall any problem with SIZE back then.
( Perhaps I should not have referred to 32-bit Fortran : 3 GB memory usage, where a LOC KIND(int64) result would have been more convenient. )

Fortran usability is now being compared to other languages, such as Python and Julia. I would be surprised if they would provide intrinsics like Fortran SIZE that can report a -ve size value. Why is Fortran so petty as to provide an answer from an intrinsic function that is wrong.

I AM suggesting that since SIZE (ARRAY) can exceed HUGE(Int64), it should always providing an Int128 result. :wink:

1 Like

Python/NumPy does not serve as a good role model: Default integers are 32 bit on Windows and 64 bit on Linux, see python - numpy array dtype is coming as int32 by default in a windows 10 64 bit machine - Stack Overflow. This is related to the size of C’s long.

1 Like

The choice of default integer kind for 64-bit Fortran is an interesting one.
My impression of the reason they chose int32 is that changing to int64 would have made any initial conversion much more difficult, while for the majority of conversions, a default int32 requires very little change, except for identifying array subscripts that can exceed 2^31.

As for use of int128, I think the largest addressible space on Win-64 is 2^48 bytes ?
I do not know the limit on Lunux andI am unlikely to have the experience of using any recent super computers to know what they support.

At the moment, all my 64-bit direct solvers are in-memory solvers, so what size memory I use is what I can afford to buy. Has anyone used a memory address approaching 2^48 ?

I have used a few Fortran compilers that broke that rule. CDC and most early “mini” had different size default integer (16-bit) and default real (32-bit), although with the mini I rarely used default real.
Probably another reason why int64 has not been chosen for most 64-bit compilers. Has Cray HPC adopted int64 default ?

My two cents is that if the problem is complex enough that such large arrays are to be handled on the same shared memory, then, it’s HIGHLY recommended practice that the program size is standardized by the developer(s) to a well-defined maximum extent:

module environment
   use iso_fortran_env
   implicit none

   integer, parameter :: IDX_T = int64 ! or whatever else
   integer(IDX_T), parameter :: IDX_MAX = huge(1_IDX_T)
end module environment

program size_test
  use environment
  integer(IDX_T) :: n
  real(8), allocatable :: x(:)
  n = 3d9
  allocate(x(n))
  print *, size(x,kind=IDX_T)
end program
1 Like

I used PDP-11 minicomputers in the 1970s that had 16-bit default integers and 32-bit default reals. These machines had a max 65 kB of memory (and due to expense, many were configured with just 1/4 or 1/2 that), so the integers were sufficiently large to address everything. A compiler option allowed the default integer to be 32-bits to match the real size, and that was useful when converting mainframe programs to the minicomputer. The later VAX minicomputers initially used the PDP fortran compilers in compatibility mode, but when the first VAX compilers were developed, they used 32-bit default reals (I think with a compiler option to revert back to 16-bit integers). As far as I know, IBM 360 and 370 mainframes dating back to the 1960s used 32-bit default integers and reals. Before that, the IBM computers had 36-bit default integer and default reals. The storage requirements were important then because common blocks were often reused with different mixtures of integer and real quantities. Memory was scarce and expensive, and in a timeshare environment, whatever was there was divided up among several users. As far as I know, the integer/real/logical equal-storage requirement was always in fortran for that reason.

As for current CRAY machines, which use AMD cpus, the default is 32-bit integers and reals. This of course was a source of errors for those converting from the previous CRAY cpu machines that used 64-bit storage for default integers and reals. Only 48 of the bits were used in the integer arithmetic, at least for some integer operations. I don’t remember the details, but you could use other instructions, or change modes somehow, if full 64-bit integer arithmetic was required, but it was slower.