Transferring bits with different integer KINDs

I have a question about how to transfer bits from one integer to another integer of a different kind. Let me use int64 and int32 as an example, but I would really like to know how to best do this for arbitrary integer kinds. So suppose I have some bits stored in the int64 variable i64, and I want to extract the low-order 32 bits and place them into the int32 variable i32.

The best and clearest way to do this should be something like

call mvbits( i64, 0, 32, i32, 0 )

Unfortunately, for some unknowable reason, that is not allowed. The mvbits() intrinsic requires the KIND values of the two integer variables to be the same. Ok, so there are lots of other bit operators, so lets look for other clear approaches. The f77 way to do this is with the equivalence hack.

integer(int64) :: i64
integer(int32) :: j(2), i32
equivalence (i64,j)
i64 = ...whatever...
i32 = j(1)

There are two problems with this. First is that this only works for little-endian addressing. On a big-endian machine the last statement should be int32 = j(2). So you can add some code to test for endian conventions, and then finally get the right assignment. Ok, that works, but it is no longer simple and clear. The other problem is that equivalence() is an obsolete feature. The NAG compiler, for example, will not compile the code without special options to enable the feature and to disable the error messages. So none of that is clean.

You can also remove the equivalence statement and use transfer.

j = transfer( i64, j )
i32 = j(1)

but the same endian problems arise with the last assignment. So no longer clear and simple, and why should the programmer need to worry about endian addressing conventions anyway. He knows which bits he wants, and he knows where to put them, so the endian addressing seems like an unnecessary distraction anyway.

So lets try another approach. the expression ibits(i64,0,32) gets the right bits. In fact, the programmer can extract any number of bits from any location that way, This can also be done with combinations of left and right shifts, which is probably what ibits() does anyway. The problem here is that the result of the ibits() function is the same type as its first argument, so the result is int64 in this case. So when the result is a positive value less than huge(i32), the simple assignment

i32 = ibits(i64,0,32)

will work. But if bit 31 is set, then the value of ibits() exceeds the range of the int32 variable on the lhs, and then the assignment is undefined (or if overflows are trapped, it can cause a run time exception). So that then leads to multistep assignments, where each of the intermediates is small enough to avoid the overflow problem. Of the several possibilities, here is one example.

i32 = ibits(i64,0,31)
if ( btest(i64,31) ) i32 = ibset(i32,31 )

I think that works, and is portable and standard conforming, but it seems both complicated and inefficient.

I experimented a little with these various possibilities, and with the gfortran and flang compilers I found that the simple assignment

i32 = ibits(i64,0,32)

does in fact work, even if bit 31 is set. In fact, the assignment i32=i64 effectively just moves the low-order bits into i32, ignoring the high-order bits. The same thing seems to occur with other bit operators where the result is int64 and bit 31 is set. That is a convenient feature for doing this kind of bit manipulation (and maybe an inconvenient feature for other purposes), but I do not think that behavior is specified by the standard, so I wonder how portable it is across compilers or with various compiler options (particularly those that enable tests for overflows during assignments.

The programmer might try to do something a little more explicit

i32 = int( ibits(i64,0,32), int32 )

but I think this has the same undefined behavior regarding overflows. That is just an explicit way to do what the automatic conversion rules are doing anyway. F2023 section 16.9.110 does not give any further guidance on this issue.

I think in principle this could be fixed in the standard with an extra argument

i32 = ibits(i64,0,32, kind=int32)

just as mvbits(i64,0,32,i32,0) could be fixed in the standard to work correctly, but that kind of change could take five or ten years.

So my question is what is the best approach for a programmer to use when moving bits between integers of different KINDs? I think this has been an issue since f90, so for some 35 years now. I guess this doesn’t bother enough people to have been of any concern all that time.

I think an intrinsic function to test for endianess would also be useful. Not hard to write your own but like a lot of little things like this it would be better (IMHO) if it was an intrinsic.

Here is a little program that demonstrates what happens with the assignment i32=i64.

program overflow
   use, intrinsic :: iso_fortran_env, only: int32, int64
   integer(int64) :: i64
   integer(int32) :: i32

   i64 = ieor( ishft( 255_int64, 32 ), 15_int64 )
   call printit('00')
   i64 = ibset( i64, 63 )
   call printit('10')
   i64 = ibset( i64, 31 )
   call printit('11')
   i64 = ibclr( i64, 63 )
   call printit('01')
contains

   subroutine printit(ch)
      character(*), intent(in) :: ch
      character(*), parameter :: c64 = '(a,1x,a,b64.64,1x,i0)', c32 = '(a,1x,a,b32.32,1x,i0)'
      print c64, ch, 'i64=', i64, i64
      i32 = i64
      print c32, ch, 'i32=', i32, i32
      return
   end subroutine printit
end program overflow

The output with gfortran, flang, and nagfor is:

$ gfortran overflow.f90 && a.out
00 i64=0000000000000000000000001111111100000000000000000000000000001111 1095216660495
00 i32=00000000000000000000000000001111 15
10 i64=1000000000000000000000001111111100000000000000000000000000001111 -9223370941638115313
10 i32=00000000000000000000000000001111 15
11 i64=1000000000000000000000001111111110000000000000000000000000001111 -9223370939490631665
11 i32=10000000000000000000000000001111 -2147483633
01 i64=0000000000000000000000001111111110000000000000000000000000001111 1097364144143
01 i32=10000000000000000000000000001111 -2147483633

So the assignment appears to simply copy the low-order 32 bits while ignoring the high-order bits. However, I don’t think this behavior is prescribed in the standard. These results were obtained on an arm64 machine in little-endian mode (MacOS). There is also the question of how this code behaves on a big-endian machine.

edit: I changed the initial i64 value to make it more clear that it is the low-order 32 bits that are being copied by the (nonstandard) assignment statement.

16.3.1 Bit model, para 3: The interpretation of a negative integer as a sequence of bits is processor dependent.

Least significant is least significant, I don’t see how endianness changes anything. It will affect unformatted I/O but that is not the topic here.

I respectfully disagree - I have this pattern in my codes - it’s literally two lines of code:

use iso_fortran_env, only: i4=>int32, i1=>int8, i8=>int64
integer(i4), parameter :: ENDIAN_TAG    = transfer([integer(i1) :: 1,0,0,0], 0_i4)
logical,     parameter :: LITTLE_ENDIAN = ENDIAN_TAG == 1_i4
integer,     parameter :: low_half = merge(1, 2, LITTLE_ENDIAN)

integer(i8) :: i64
integer(i4) :: j(2), i32
equivalence (i64, j)
i32 = j(low_half)
end

This is great because:

  • compile-time expression
  • no slow data copy as prescribed by transfer

I know it’s a contrarian take, but I will care when it’s removed from the major compilers (likely I will be safe for another 30-40 years). Incredible how the only feature that allowed for a static cast of (at least intrinsic) variables has been removed from the language instead of extended. All other languages went in the opposite direction, but there we go, ships have sailed, and modern high-performance codes are not in Fortran anymore.

I have the following code in a standard module that defines commonly used constants that I include in all my codes.

  Integer(INT32), Parameter :: itest = INT(Z"04030201",INT32)

! Little endian systems will store 01 in the lowest byte of itest. Big endian
! systems will store 04 in the lowest byte of itest.

  Logical,        Parameter :: little_endian = (01_INT32 == IBITS(itest,0,8))
  Logical,        Parameter :: big_endian   = (04_INT32 == IBITS(itest,0,8))

Note that F2023 removes the requirement to use the INT function in defining itest.

I do not have access to a big-endian machine right now, so I can’t test this code. The question is what happens with the i32=i64 assignment statement. For values that are -huge(i32)≤i64≤huge(i32), the behavior is defined by the standard so that the values of the two variables are the same after the assignment. For twos-complement arithmetic, that works by just copying the low-order bits. I think that copy works for ones-complement arithmetic too. For sign-magnitude arithmetic, I think the operation instead would copy the low-order 31 bits and then copy separately the sign bit, presumably bit 63 from int64 copied to bit 31 of int32. But in the general bit transfer case, that inequality will not be satisfied, so the simple assignment is not covered by the standard. In these cases, the assignment might do the same thing as in the legal assignment cases, simply ignoring any other bits, or it might copy the low-order 31 bits and combine that with the sign bit, or it might raise some kind of run time exception, or (as they say) it could start WWIII.

But the intended purpose of the assignment in this code is to copy the bits, not to preserve the integer values. Even with all of the fortran intrinsic bit operations defined by the standard, I fail to see a standard conforming way to do that operation in a single step. In the case of mvbits(i64,0,32,i32,0), the expression violates the KIND restriction on the arguments. In the case of ibits(i64,0,32) (or any of the other bit operators), the value of the expression violates the upper bound restriction on the assignment. Just in case it wasn’t obvious, my suggested change, ibits(i64,0,32,kind=int32) was intended to copy the bits into the result, not to preserve integer values. I gave a two-step operation that is standard conforming, but writing that code leaves the programmer with feelings of disgust and shame.

Yes, that is what I do too. I usually use int32 and int64 values, but the idea is the same. I guess it is a matter of opinion whether that is clear and obvious code.

The NAG compiler already does not accept equivalence(). There are compiler options to allow compilation of legacy codes, but I think that is going to be the trend for obsolete features for all fortran compilers in the future.

I don’t think this does what you hope.

The hex-digits A through F represent the numbers ten through fifteen, respectively; they may be represented by their lower-case equivalents. Each digit of a boz-literal-constant represents a sequence of bits, according to its numerical interpretation, using the model of 16.3, with z equal to one for binary constants, three for octal constants or four for hexadecimal constants. A boz-literal-constant represents a sequence of bits that consists of the concatenation of the sequences of bits represented by its digits, in the order the digits are specified. The positions of bits in the sequence are numbered from right to left, with the position of the rightmost bit being zero. The length of a sequence of bits is the number of bits in the sequence. The processor shall allow the position of the leftmost nonzero bit to be at least z āˆ’ 1, where z is the maximum value that could result from invoking the intrinsic function STORAGE_SIZE (16.9.200) with an argument that is a real or integer scalar of any kind supported by the processor.

LITTLE_ENDIAN will always be .TRUE. and BIG_ENDIAN will always be .FALSE.

That is not my experience (with 7.2 Build 7244). I see a warning pointing out the obsolescent feature on line such-and-such being used but compilation succeeds.

Then this flies in the face of my understanding of endianess and how Hex numbers as defined by BOZ constants are stored in memory. The least significant byte represented by 01 should be the first byte (in byte order) in memory on a little-endian machine . The reverse holds for big-endian, they should be in the highest byte.

I don’t see how functionally this is any different than the procedure that @FedericoPerini described above.

The point is that IBITS(I=itest,POS=0,LEN=8) will always depend on the 8 least significant bits, which are 0000 0001, in right to left order (and bit positions 0 to 7), and make up the integer value 1.

@FedericoPerini used the TRANSFER intrinsic which has different semantics, and will be sensitive to endianness.

I don’t have access to a big-endian machine to test this, but I think @themos is right. As long as you work with a single integer kind, say int32 in the example code, then the low order bits are always consistently the low order bits when using the fortran bit operators. The problems with addressing conventions arise when using different integer KINDs and when doing binary i/o, with EQUIVALENCE, or with TRANSFER. Or in my code, the question is about what exactly the assignment i32=i64 does when the value on the rhs is outside the range of the lhs.

I can understand why the fortran standard has avoided this issue in legacy codes. I don’t know the exact numbers, but I’m pretty sure that in the 1970s most byte-addressable machines where big-endian. These would include IBM mainframes and the various lookalikes. For the past 20 years or so, and nowadays, I would think that the majority of machines are little-endian. Many modern CPUs even have the ability to switch, just by setting a hardware register value. For all that time in between, the programmer has had to address this issue, with little help from the fortran standard. I can understand part of that history. Up until f90, there was only one integer type (kind using modern terminology), so the standard could say nothing about how bits were transferred with assignment or with i/o among integers. But since f90, there has been the possibility of different integer kinds, and I must say, it sure would have been nice all that time to have had a little more help from the standard. The generalizations of the MVBITS() and IBITS() intrinsics I mentioned above would have solved most of my problems in this respect.

I think this complicated wording is to allow the situation where the integer type has padding bits. Legacy Cray integers might be an example. The normal integer operations only work with a subset of the bits, but the bit operators are intended to work with the full storage_size() number of bits.

You are right, I did not read the error message closely enough.

program xxx
   use, intrinsic :: iso_fortran_env, only: int32, int64
   integer(int64) :: i64
   integer(int32) :: j(2)
   equivalence (j,i64)
end program xxx

$ nagfor equiv.f90
NAG Fortran Compiler Release 7.2(Shin-Urayasu) Build 7203
Obsolescent: equiv.f90, line 5: EQUIVALENCE statement
Error: equiv.f90, line 5: EQUIVALENCE of non-default intrinsic type to default numeric
Errors in declarations, no further processing for XXX
[NAG Fortran Compiler error termination, 1 error, 1 warning]
$ a.out
-bash: a.out: command not found

It is the nondefault kind of i64 that is causing the error.

If you want to have some fun, here is ChatGPT (Code Copilot 5.4) ā€œsolutionā€ to the problem.

module bit_copy_mod
  implicit none
contains
  pure function copy_bits(src, mold) result(dst)
    integer, intent(in) :: src
    integer, intent(in) :: mold
    integer(kind(mold)) :: dst
    integer :: i, n

    dst = 0_kind(mold)
    n = min(bit_size(src), bit_size(dst))

    do i = 0, n - 1
      if (btest(src, i)) dst = ibset(dst, i)
    end do
  end function copy_bits
end module bit_copy_mode
!
! Usage:
!
program demo
  use iso_fortran_env, only : int8, int32, int64
  use bit_copy_mod
  implicit none

  integer(int32) :: a
  integer(int64) :: b
  integer(int8)  :: c

  a = int(z'80000000', int32)
  b = copy_bits(a, 0_int64)
  c = copy_bits(a, 0_int8)

  print '(Z8.8)',   a
  print '(Z16.16)', b
  print '(Z2.2)',   c
end program demo

I saw this right off the bat. I didn’t catch the mod/mode misspelling until I compiled it.

However, the idea of copying bits one at a time is standard conforming and it does work as intended. It is just some 32x or 64x slower than it should be. I’d put that in the realm of ā€œdisgust and shameā€ if a programmer must do that in a language.

I wonder where the AI got that idea? There must be lots of code out there somewhere that actually does this, right?

In your case you have to copy 31 bits with a single statement, and have a separate test+statement for the 32th bit. Not perfect, but not too bad either.

I had that example in my original post.

i32 = ibits(i64,0,31)
if ( btest(i64,31) ) i32 = ibset(i32,31 )

But it is still ugly, our language should be better than that.

Actually, my point (in having fun) was how profoundly wrong was the code, with the function copy_bits having result type (kind) derived dynamically from one of the arguments :man_facepalming:
The very procedure of moving the bits was not that bad, after all, albeit impossible to implement that way.
I understand that there is no possibility to distinguish kinds of the same intrinsic type in select type construct.
Edit: as @RonShepard noticed downthread, I was wrong,

    select type ( a )
    type is ( integer(kind=kind(1)) )
      print *, "'a' is default integer"
    type is ( integer(kind=int64) )
      print *, "'a' is 64bit integer"

is perfectly valid. That makes the idea of setting bits doable though complicated.

I fully agree!

Actually you can do that, but that is not what the AI code was doing. The AI code just declared everything as default integer, it just did it in an obscure way.