Integer 4 or integer 8?

RonShepard · December 31, 2024, 4:06am

Not just integer and real, but also default logical, and furthermore default complex must occupy two storage locations. These things are just for the defaults, there is of course nothing that prevents a programmer from using kind values other than the defaults, so I do not see this as any kind of significant limitation to the language or its future developments.

PierU · December 31, 2024, 8:18am

No way. If you change that, tons of legacy codes will be broken, because they use storage association between different types (and even quite often argument association between different types, even if not legal).

And I don’t see how it constrains future uses: if you need a larger integer, just use integer(selected_integer_kind(18)) or integer(int64). This is more verbose but it does exactly what you want. And F202Y will make it simpler, by allowing to define what should be the default integer/real/logical/character in each programming unit.

In practice, I almost never need more than the default 32 bits integers, although I am regularly manipulating arrays with 10**10 elements or more, but they are multi-D arrays, and each dimension is much less than what a 32 bits integer can handle. Note also that the int type in C/C++/etc is almost always 32 bits, and everybody is ok with that.

ivanpribec · December 31, 2024, 3:31pm

Is this just for literal values?

When I posted yesterday, I similar thought crossed my mind, that using N-d arrays can help raise ceiling of what you can effectively address with 32-bit signed integers. Also array expressions could help avoid loops, but not always. However size(a) would still overflow, unless you include the kind argument - size(a,kind=long_int).

Coincidentally, I noticed that one of the citing works of the ACM article I linked above, proposes using a two-dimensional virtual address space:

Pierre Michaud. What about a two-dimensional virtual address space?. RR-9563, Inria. 2024, pp.27. ⟨hal-04816363⟩

In Section 6.1 the author discusses multi-dimensional arrays and how a 2-d address space can solve the the array-of-struct vs. struct-of-array dilemma. It’s interesting to note w.r.t to this point, that by using the assumed-shape array abstraction in Fortran, you can write algorithms, which support both contiguous and strided layouts with just “half” the programming effort.

And with parameterized derived types, you can switch between layouts (it was @Reinhold_Bader that showed me this),

type :: soa_or_aos(n)
    integer, len :: n
    real :: x(n), y(n), z(n)
end type

type(soa_or_aos(:)), allocatable :: points(:)

! Struct of Arrays
allocate(soa_or_aos(n) :: points(1))

! Array of Structs
allocate(soa_or_aos(1) :: points(n))

! Array of Struct of Arrays
allocate(soa_or_aos(8) :: points(n/8))

Admittedly, the nested arrays kill the elegance of array referencing in the AoS and AoSoA cases. Also compilers still suffer from abstraction penalty.

While I understand the convenience of good defaults and the frustration when these get in the way; I think a proposal to change this would be hard to motivate without compelling examples beyond size(). As @PierU already explained, the standard provides the mechanism using kind specifiers. Vendors provide flags like -fdefault-integer-8 but this is a heavy hammer.

In the linked ACM article there is an interesting anecdote,

SGI (Silicon Graphics). Starting in early 1992, all new SGI products used only 64/32-bit chips, but at first they still ran a 32-bit operating system. In late 1994, a 64/32-bit operating system and compilers were introduced for large servers, able to support both 32-bit and 64-bit user programs. This software worked its way down the product line. A few customers quickly bought more than 4 GB of memory and within a day had recompiled programs to use it, in some cases merely by changing one Fortran parameter. [emphasis added]

Either this was a change from integer*4 → integer*8, the addition of a compiler flag, or a change to the integer(kind=...) specifier, I can’t say.

In the other article I linked, written by DEC, HP, IBM, Intel and others, found the opposite to be true:

A significant number of applications use C and FORTRAN together – either calling each other or sharing files. Such applications have been amongst the quickest to find reason to move to 64-bit environments. Experience has shown that it is usually easier to modify the data sizes/types on the C side than the FORTRAN side [emphasis added] of such applications, and we expect that such applications would require a 32-bit integer data type in C regardless of the size of the int datatype.

In C I imagine, you would use a tool to replace all int with myint and place a typedef at the top of your main:

typedef intptr_t myint;

The problem is that this is a cascading change, that needs to be propagated through all (third-party) libraries and build scripts. In some places you may need to rewrite the code if it silently relied on the assumption that int was 32-bit.

This reminds of a chapter from the novel The Goal by E. Goldratt. The protagonist is on a hike with his son’s group of boy scouts. Due to his weight, one of the boys, Herbie, is a slow walker. The troop of boys has to finish the hike, but they can’t finish unless Herbie finishes. To finish the hike, they need to help Herbie finish. This is the central analogy in the book which introduces the concept of bottlenecks.

As long as there is software around that relies on 32-bit int (and long and pointers), it will be hard to move on. For instance, Intel only deprecated the IA-32 compiler last year: Intel Fortran Compiler (version 2025.0.0) - #2 by johnalx, and there is a long tail of users. At the end of the day, there is a reason we call it software, because it is meant to be reprogrammed if needed.

RonShepard · December 31, 2024, 4:13pm

The PDT design and features have been very attractive to programmers since they were introduced some 20 years ago in f2003, but, as we all know, they are still not fully supported and robust in some of the popular fortran compilers. For programmers wanting to write portable code, that means they still cannot be used freely.

I’m not a fluent C programmer, but this claim in the article doesn’t really make sense. With its various combinations and repetitions of the modifiers “short” and “long”, it seems to me like C took the worst possible approach to declare the various integer and real data types. Of course C has a standard preprocessor, along with typedef, and I do agree that those are really nice features for this purpose. But fortran’s underlying type+kind system seems much more flexible and open-ended.

wspector · December 31, 2024, 4:45pm

Storage association has been codified since Fortran 66. For better or worse, many older codes using COMMON and EQUIVALENCE have depended on it.

Personally, I think intrinsics dealing with array sizes, character string lengths, search position results, etc should return what C programmers would call a ‘size_t’, instead of default integer. If fed into a 32-bit integer on a computer with 64-bit addressing, the results could still be wrong. But at least compilers could give ‘narrowing’ warnings.

The MPI library finally addressed this situation in MPI 3.0 by defining MPI_ADDRESS_KIND and MPI_COUNT_KIND. Then by using the mpi_f08 module, problems can be found at compile time.

RonShepard · December 31, 2024, 4:55pm

In C, that is an unsigned integer type, right? Fortran does not (yet) have any unsigned integer types, including one of sufficient range to represent a raw hardware address.

wspector · December 31, 2024, 5:12pm

It is a good point. But in the context of sizes and lengths, they don’t need to be unsigned - even though they are. LBOUND and UBOUND do need to be signed. For raw hardware addresses, there are already definitions for the C equivalents in ISO_C_BINDING (e.g., C_INTPTR_T).

PierU · January 1, 2025, 9:11pm

IIRC this is also for declarations, returned default integer values, etc…

iso_fortran_env may provide a size_int (or whatever) kind, similar to the size_t C type. OK, c_size_t can do the job, but would be better outside of the C interoperability.

But then, the size() function should return an integer(size_int) value… It would break the backward compatibility, though. But apart from weird corner cases, I don’t think it would be a problematic breakage…

But I’m not sure how it would interact with the future “default” feature of F202Y.

wspector · January 2, 2025, 8:42pm

With a few seconds more thought, one case that could silently break is when used in a procedure call, e.g.:

call oldsub (x, size (x))

If ‘oldsub’ doesn’t have an explicit interface…

Another example is in I/O:

write (20) size (x), x

The resulting file might be incompatible with older files, and the corresponding read statement. However if the size of x is very large, the older files might have had bad values to begin with.

No doubt there are other cases. A sticky wicket.

RonShepard · January 2, 2025, 10:54pm

On a little-endian machine, this broken code would continue to work for the cases where size(x)<2**31. The low-order bits of the kind=size_int integer would be the same as the kind=int32 integer that was expected by oldsub(). It would only be for the larger sizes where, for example, the dummy argument becomes negative when bit 32 is set, or when some of the higher order bits are set and ignored that the broken code would reveal itself. On big-endian machines, the error would presumably reveal itself immediately since the high-order bits of the two integer kinds would not match.

JohnCampbell · January 3, 2025, 7:22am

SIZE and LOC (universal non-standard extension !!) do not work for default integer in either 32-bit or 64-bit environment.

Silverfrost FTN95 compiler has an interesting (non-standard) solution for this by providing integer (kind=7) which is the integer kind for memory address, being 4bytes(~) for /32 and 8bytes for /64.
They use it extensively for memory address and for their Clearwin+, Microsoft graphics interface. This enables the same code to work in both the 32-bit and 64-bit compiler.

Perhaps iso_fortran_env should support a named integer kind “size_t” for memory addresses, and other relevent intrinsic functions, including LOC and SIZE. (perhaps a better name than size_t should be chosen?)

Silverfrost FTN95’s Clearwin+ still provides a very capable graphics environment for 32-bit and 64-bit Fortran program development.

~ Actually we were also let down with 32-bit when DOS provided 3GBytes plus of adressable memory. You needed an unsigned 32-bit integer for LOC, rather than the -ve values above 2 Gbytes. I overcome by writing integer(8) JLOC to replace integer(4) LOC, but compiler support would have been much better.

PierU · January 3, 2025, 9:08am

Of course, how could I miss that! So, new intrinsic functions size2(), lbound2(), etc, would be needed. I’m not sure it’s worth, though, given the relatively limited use cases (huge flat 1D arrays).

PierU · January 3, 2025, 9:09am

I wrote above:

wspector · January 3, 2025, 4:27pm

It is actually a much more common problem with multi-dimensional arrays (e.g., size() without the 2nd argument).

ashe · January 3, 2025, 5:01pm

I found this description of the proposed feature: https://j3-fortran.org/doc/year/23/23-199r1.txt
Does anyone know if there are more details anywhere?

It looks like storage association rules won’t apply to INTEGER, REAL, etc. if their default kinds are changed in the source code. And intrinsics like SIZE will return the new kind of default INTEGER.

PierU · January 3, 2025, 7:45pm

True. But in practice, I almost never need to get the total size of a multiD array, I just need to get the sizes of some dimensions. And when it happens, using size( x, kind=int64 ) is not a big deal.

RonShepard · January 3, 2025, 9:49pm

I find that I often use size(x,dim=2) and similar as the upper bound of do loops for assumed shape arrays or for allocatable arrays. I probably should be using lbound() and ubound() instead, but it seems like size() should be faster. I also use size(x) sometimes for whole multidimensional arrays, but probably not as often as the other case.

gardhor · January 4, 2025, 11:09am

Not in my case.

I’ve faced this problem with the default integer kind in my code. Formally, it needs multidimensional arrays, but I can’t use multidimensional arrays because I don’t want to limit the number of dimensions and this number can be too large (one application was done with 21 dimensions or even larger). So, I have to flatten the multidimensional array into a vector (1D-array) and I have to use pointers or reshape instructions to extract the working dimension.

Originally when, I designed the code, I didn’t think about a vector size too large to be fitted in the default integer kind, but recently I’ve faced the default size problem (with this 21D study) and I’ve to change the integer kind default at compile time and also recast the new default integer kind to int32 integer when calling LAPACK or ARPACK (by the way, you can call LAPACK subroutines without recasting with MKL library).

I know, that changing the default integer kind was not the optimal option but it was the only reasonable one at this time.

PierU · January 4, 2025, 12:09pm

I would say this is a corner case.

I don’t know if it can fit your problem (maybe it can’t), but in such cases I would tend to use nested types, each level handling a reduced number of dimensions.

gardhor · January 4, 2025, 8:46pm

Thanks for this tip. I hadn’t thought about that. However, I wonder if it will be as efficient as using a 1D-array.

Topic		Replies	Views
What are the true portable or best to define real 4 real 8 integer 4 integer 8, etc?	15	2438	April 15, 2022
Discussion on Fortran's type-kind distinction	12	1601	October 25, 2021
Initialize 64-bit integer variables with 64-bit real values Help	7	1953	March 11, 2022
Newbie question on parameterized-declaration-list Help	1	347	January 8, 2023
New GitHub repo Open Catalog of Best Practices for Modernization and Performance Announcements	5	456	April 1, 2024

Integer 4 or integer 8?

Related topics