Integer 4 or integer 8?

@gardhor, just noticed this thread -

FYI we’ve just completed templating the integer kind for stdlib’s internal BLAS and LAPACK implementation, so it now offers both 32- and 64-bit integer sizes for all procedures.

All interfaces are now templated, which means you can i.e. call gemm(...) agnostically with either integer kind, and it will just work.

Then if you have an external library, you can link against it and then either the 32- or the 64-bit implementtaions will be replaced with calls to the external library.

Because there’s a lot more code, we’ve made a special branch that you can quickly check out with fpm as

[dependencies]
stdlib = { git="https://github.com/fortran-lang/stdlib", branch="stdlib-fpm-ilp64"}

or just use CMake (turning 64-bit integer support on), see readme.

6 Likes

This is 100% pure gold.

1 Like

I don’t think large 64-bit arrays will be a corner case for much longer !
This is a short sighted approach which needs to change.

1 Like

Thanks for the “short-sighted”.

But please don’t quote out of context and by forgetting all the points I have developed.

On modern computers if you are running on a 64 bit architecture, the CPU can only access 64 bits, no less. Meaning by using 32 bit integers, the CPU has to trim off the rest of those 64 bits it accesses

Has to trim off the last 32 bits**

If the bus width is 64 bits, loading a single 32 bits integer from RAM to a CPU register is not faster than loading a 64 bits integer. However, when operating on an array of integers, one can load twice as many 32 bits integers (in the vector registers) in the same number of instructions, compared to 64 bits integers.

3 Likes

Ahh thank you I misunderstood I didn’t take vectorization into account

On modern systems one must also consider caching of data. Data is loaded into the processor chip, and stored to memory from the chip, in cache line quantities. For example the ancient i5 microprocessor I’m typing this on uses 64 byte cache lines. Main latency is between off-chip references and bringing the appropriate cache line on chip to be read and possibly updated. Once on-chip, and data makes it between the three levels of on-chip cache, there is very little, if any, difference in timing between individual 32-bit and 64-bit accesses between the CPU registers and the L1 data caches.

Does the same happen with selected_int_kind(2 | 4) which would be 8 and 16 bits? From what I’ve seen that’s not the case, so it might not be a universal rule.

But having pointer addresses is only needed for C bindings, and that is provided by type(c_ptr). Fortran pointers are handled intrinsically and one should never need to obtain a pointer address. Am I correct?

2 Likes

I’ve seen C code that used size_t variables for things other than pointers. Also, why have a separate size_t for Fortran when you can just do

use iso_c_binding, only: size_t=>c_size_t

If you want to use c_size_t in some context other than C interop.

I wonder how conforming this statement is ?
Does the C integer “kind” c_size_t provide a consistent Fortran kind value for a 64 bit integer memory address?

Perhaps it should be:
use iso_fortran_env, only: size_t=>int64

I think on a 32-bit address machine, c_size_t will be a 32-bit integer, not a 64-bit integer.

c_size_t and int64 are probably the same in most modern machines nowadays. However, both are signed integers, while size_t in C is an unsigned integer type. The interoperability between C integers size_t and fortran integers of kind c_size_t just means that the latter has enough bits to hold the former, not that the values of such integers are the same.

For fortran compilers that do not support 64 bit integers, the C interopability still works for things like c_loc() because the return result is a derived type, type(c_ptr). That derived type can be a single 64-bit integer, or two 32-bit integers, or any other combination that totals enough bits to store a C address. That level of abstraction was probably more important back in 2003 than now. Now fortran itself is required to support an signed integer kind with 18 decimal digits precision; modern compilers do that with int64, so much of this abstraction seems unnecessary. There are still numerous corner cases related to signed vs. unsigned interpretation of those bits. I still hope unsigned integers will be supported directly in fortran, it would simplify so many things.

Besides size_t there’s also ssize_t, which is signed (it represents either size_t or -1). So the maximum value size_t can hold is probably the same as int64.

For pointers, I think intptr_t is the one that holds the value for address range, according to the model used (ILP32, LP64, etc.)

I can not recall the last 32-bit address machine I used ? Certainly not in the last 20 years !
The last 32-bit OS I used also provided 3GBytes addressing. This would have benefited from a 32-bit unsigned integer type, which is presently being included in Gfortran.

64-bit Fortran should have always provided an integer kind that supports c_size_t, for functions like LOC and SIZE.
I have done analysis of stack and heap locations and I do think it is a valid analysis to obtain their memory address.

Yes, this is a good point about how C works. In C intptr_t (or any other raw pointer) is most likely a true unsigned integer that matches whatever the underlying hardware does for its addressing. On a 32-bit machine, the minimum value would be 0 and the max value would be 2**32-1. The size_t and ssize_t types can be different, but in practice I think they are always the same number of bits, the former unsigned and the latter signed. Both are required to be able to return the size of the largest thing that can be declared within the language. On a 32-bit machine, I think that limit corresponds to a byte array of size 2**31-1. A size_t variable might have values that are up to twice that big, but if an object cannot be declared that large, then those values would never be returned as a valid size of anything. A ssize_t value is signed, but its max value must also be large enough to represent the size of the largest possible object, which is 2**31-1, and also to return at least a few negative values that are used as error codes within the language. So on a 32-bit machine with these conventions, the pointer values would be in the range [0,2**32-1], the size_t range would be [0,2**31-1], and the ssize_t range would be something like [-1,2**32-1] (or maybe the lower bound should be -2 or -3 instead of -1, I’m not sure of all the standard C error code values in play). In practice, of course, a ssize_t variable would be able to hold a value as small as -2**31 with the typical twos-complement 32-bit integer representation.

But suppose a given compiler also supportes a larger signed integer type, say 40 bits, but it still uses the 32-bit intptr_t and size_t types to be consistent with the underlying hardware. Then I think the size_t range could be extended to [0,2**32-1], and ssize_t values could use that 40-bit type with the range [-2**39,2**39-1], but it would be required to only represent values in the range [-1,2**32-1].

For those who are more fluent than me in C, is this all correct?