Moving Bits Question

I am rewriting some old codes from the 1970s and 1980s that pack/unpack and move bits around. Back then, none of the bit operators that we used were standard, so we have typically 5-10 #if blocks of conditional code to handle all of the compiler-specific versions. We sometimes use EQUIVALENCE with LOGICAL*1 bytes and INTEGER*2, INTEGER*4 and INTEGER*8, and also REAL*8 for machines that did not support INTEGER*8. Fortran 77 and earlier was very limited in this respect, so we had to work to make things as consistent, clear, portable, and efficient as possible.

However, since f90, the usual bit operations have been standardized, and since f2008 we now have the INT8, INT16, INT32, and INT64 integer kinds (with INT64 being required, I think), so now we have more to work with than we did 40 years ago. We still do not have an intrinsic bit string type, which would make all of this simple and portable, but we do have more tools in the standard than we used to. So one required operation is to pack 32-bit integer values into an array of 64-bit integer containers. My first thought is something like

   integer(int32) :: i32
   integer(int64) :: i64
   ...
   call mvbits( i32, 0, 32, i64, ipos )
   ...
   call mvbits( i64, ipos, 32, i32, 0 )

where ipos == 0 or 32 as appropriate for the i64 destination. That syntax seems simple and clear and it avoids all the complications of the sign bit – the bits should just be copied with no interpretation or scaling or sign extending or anything. But of course for those who do this all the time, you immediately recognize the problem that the mvbits() intrinsic requires the from and to arguments to be of the same KIND. Ok, so then I think

call mvbits( int(i32,int64), 0, 32, i64, ipos )

should do the trick. That works for positive i32 values, but that doesn’t work when the sign bit is set. So then I think

call mvbits( transfer(i32,i64), 0, 32, i64, ipos )

should work. On my computer, the transfer() leaves the most significant 32 bits in the from argument unspecified, but that doesn’t matter because mvbits() is not referencing them.

This actually works on my computer, so maybe I should just stop while I’m ahead. But here is the problem. The transfer() operation just moves the bits, it doesn’t really define where those bits get moved, in contrast to the IOR(), IAND(), SHIFTL(), and so on bit operators. The programmer isn’t really supposed to use the results directly from transfer(), it is just supposed to hold the bits for a later transfer() operation that reverses the effect. So specifically in this case, the 32 bits that I’m working with might conceivably end up in either the low-order bits or the high-order bits of the transfer() result. So now I’m back to writing machine-dependent code like

call mvbits( transfer(i32,int64), jpos, 32, i64, ipos )

where jpos==0 when transfer moves the bits to the low-order part of the result and jpos==32 when the bits are in the high-order part of the result. Since this whole exercise is to eliminate machine-dependent code, that uncertainty kind of defeats the purpose. BTW, this low/high problem was there in the old 1970s code too when we used equivalence to move the bits around.

integer*4 i32(2)
integer*8 i64
equivalence (i32,i64)
 ...
i32(jpos) = bits_to_be_placed

If I want the bits placed in the low-order bits of i64, then on little-endian machines, jpos should be 1, while on big-endian machines, jpos should be 2. In our legacy code, this was sometimes done by testing a runtime expression, and it was sometimes done with #if conditional compilation. Either way, it is an ugly mess.

Ok, so maybe I should just avoid mvbits(). I could go back to shifting and masking the bits with SHIFTL, SHIFTR, IOR, etc. instead. For the 32-bit field width case

i64 = 0_int64
   ...
i64 = IOR(i32,i64)

That should put those bits in the right place, with no complications due to signed integers or anything like that. But again, there is the restriction that the two arguments in IOR should be of the same kind (understandable in this case), so this does not work. So I could go through the same steps as above with

i64 = IOR(int(i32,int64),i64)

or

i64 = IOR(transfer(i32,i64),i64)

to get the kinds to match, but now I’m back to the sign problem in the first case and to the machine-dependent transfer() problems in the second case. On my computer, the transfer operation actually places the bits in the right place, but it seems to leave random noise in the high-order bits, so something like

i64 = IOR(ibits(transfer(i32,i64),0,32),i64)

or the equivalent masking operatiion with IAND() is actually required. That seems way too complicated for the simple operation that I want to do, and of course in the end, it is still machine-dependent exactly where transfer() puts those bits within the result.

So am I overlooking something here. Is there a simple way to place 32 bits from i32 into the low-order bits of i64? If not, than all of this could be avoided if the original expression that I tried

call mvbits( i32, 0, 32, i64, ipos )

with ipos==0 would have worked. That syntax specifies clearly and unambiguously where the bits are coming from and where they are going to. Why oh why did the standard committee screw us over by requiring the from and to arguments to have the same KIND in mvbits()? And what about a bit string data type, which would have solved all of this once and for all 40 years ago.

This issue also gets back to the “eating your own dog food” issue that came up in a different thread recently. If compiler writers using fortran had encountered all of these complications 40 years ago, then they would have immediately changed the language to eliminate the problem. All this would have never been an issue in the language.

There is also the reverse problem. Suppose you have somehow managed to get the bits packed into the low-order part of i64, then how do you get them back out into i32?

i32 = ibits(i64,0,32)

works alright when bit# 31 is not set, but what happens when it is set? If the compiler just ignores the integer overflow in the assignment, then all is good, but if it detects the overflow then it could throw an exception and abort your job. Something like

i32 = transfer(i64,i32)

or

i32 = transfer( ibits(i64,0,32), i32 )

might work, but this again gets back to the ambiguity of exactly how transfer places bits. It seems like the best solution would be

call mvbits( i64, 0, 32, i32, 0 )

because that specifies exactly where the bits are coming from and going to. But mvbits() has the KIND restriction that prevents this.

Try out the bitset type in the Fortran stdlib effort:

William Clodius, a long-time computational scientist and also a contributor to many a discussion over time at comp.lang.fortran, is the primary author of the above bitsets module in stdlib. That will hopefully serve as a positive reference for you.

And I think this work on stdlib bitsets and the methods therein can meet most of your needs; should anything be lacking or running into bugs, suitable PRs would be in order to get you the rest of the missing functionality. You may even consider authoring them yourself!

3 Likes

If you want to avoid dealing with endianness, maybe you could do:

! copy an int32 to the first 32 bits of an int64
elemental integer(int64) function bit32_to_64(i32) result(i64)
   integer(int32), intent(in) :: i32

   integer :: bit
  
   i64 = 0 
   do bit=0,31
       i64 = merge(ibset(i64,bit),ibclr(i64,bit),btest(i32,bit))
   end do
end function 

and the other way around. Not-so-ugly and hopefully conceptually clear?

1 Like

I had also thought of

i64 = 0_int64
do j = 0, 31
       if( btest(i32,j) i64 = ibset(i64,j)
enddo

That is standard conforming, but it seems inefficient and inelegant. Is a loop over the individual bits really the best that fortran has to offer?

If you are dealing with integers at the bit level, then by definition you are subject to machine dependence. See wikipedia about endianness.

So, are you trying to recover the bit representation that one obtained from equivalence in days of yore, or are you trying to have the same bit pattern as one would get with printing a BOZ?

program ron

   use iso_fortran_env, only : int32, int64

   implicit none

   integer(int32) i(2)
   integer(int64) j

   i = [1233456, -123456]
   j = transfer(i([2,1]),j)

   write(*,'(2B32.32)') i
   write(*,'(B64.64)') j

   print *
   call yuck

   contains
      subroutine yuck
         integer(int32) i(2)
         integer(int64) j
         equivalence(i,j)
         i = [1233456, -123456]
         write(*,'(2B32.32)') i
         write(*,'(B64.64)') j
      end subroutine yuck
end program ron

The above yields

gfcx -o z ron.f90 && ./z
0000000000010010110100100011000011111111111111100001110111000000
0000000000010010110100100011000011111111111111100001110111000000

0000000000010010110100100011000011111111111111100001110111000000
1111111111111110000111011100000000000000000100101101001000110000

Note, the inverse operation is

i([2,1]) = transfer(j,i)

I was hoping to avoid all endianess issues by adopting a standard convention and using the standard fortran bit operators.

The transfer() operations work on my machine, but when I read the standard it seems like that is just by coincidence. The standard refers to “leading parts” of the source and destimation arguments, but I don’t see anywhere where the leading parts of integers are defined. Thus I suspect that it processor dependent, and is the same old endianess issue just hidden by a layer of abstraction.

I have individual integers. I want to store them in a bit bucket (an int64 array). And then I want to be able to retrieve or modify an individual value from that bit bucket. If I could control where the bits go into each int64 element (or elements, when they cross boundaries), then I could avoid any endian issues.

If mvbits() were a little more flexible, then that would solve the problem. But as I found out, it isn’t. As it is now defined, it just sits one level above the problem.

If the “leading parts” of integers were defined, then I could use transfer() to solve the problem. But I don’t think that is the case, so transfer() also sits one level above the problem.

My general problem is to work with strings of bits of arbitrary lengths. A bit string data type would also solve all of these issues, but that doesn’t exist in the standard. So I’m doing things with int64 arrays the hard way. If I store two 10-bit strings in adjacent locations, then I want to be able to retrieve them back as one 20-bit string in a consistent way. All of this endianess ambiguity gets in the way of doing that in a portable way. I suspect the code I have working now on my little-endian machine will not work on a big endian machine.

If all you’re interesting in is a bit bucket, why the complications of pack 2 int32 entities into a int64 and then sticking that into your bucket over simply sticking 2 int32 into a bucket?

This

    integer(int32), parameter :: num = 32 ! slots in bucket
    integer(int32) :: bucket(num)

seems much easier to deal with over

    integer(int32), parameter :: num = 16 ! slots in bucket
    integer(int64) :: bucket(num)

where you have to worry about the packing/unpacking.

@RonShepard and any other readers interested in something like this,

Please consider working with stdlib and introducing some general facilities in there that all the Fortran practitioners can make use of.

Also, avoid copying data as much as possible in such a facility.

In addition to the bitsets module I suggested upthread, if something additional is required for a specific need, here is another custom type to consider as merely a crude starting point. Based on a “NY minute” worth of thought, methinks the fields in such a “bucket” should be the length of a colloquial “byte” i.e., 8 bits. Further insight might change this thinking. Anyways, here it is: how is it for the “best” Fortran has to offer, does this pass muster?

! File p.f90
module bitbucket_m
   
   use, intrinsic :: iso_c_binding, only : I1 => c_int8_t, I4 => c_int32_t, I8 => c_int64_t, c_loc, &
      c_f_pointer
      
   private

   type, public :: bitbucket_t
   ! Field of 8-bit integers
      private
      integer(I1) :: m_fields(8) = 0_i1
   contains
      private
      procedure, pass(this) :: zap_fields
      procedure, pass(this), public :: int32 => get_int32
      procedure, pass(this), public :: int64 => get_int64
      generic, public :: zap => zap_fields
   end type bitbucket_t

   type :: pos_t
      integer :: pos = 0
   end type
   type(pos_t), public, parameter :: POS0 = pos_t() 
   type(pos_t), public, parameter :: POS32 = pos_t( pos=32 ) 
   
contains

   subroutine zap_fields( this, idx )
      ! Argument list
      class(bitbucket_t), intent(inout) :: this
      integer, intent(in)               :: idx(:)
      ! Local variable
      this%m_fields( idx ) = 0_i1
      return
   end subroutine zap_fields
   
   function get_int32( this, pos ) result(pvals)
      ! Argument list
      class(bitbucket_t), intent(in), target :: this
      type(pos_t), intent(in)                :: pos
      ! Function result
      integer(I4), pointer :: pvals
      pvals => null()
      select case ( pos%pos )
         case ( 0 ) 
            call c_f_pointer( cptr=c_loc(this%m_fields(1)), fptr=pvals )
         case ( 32 ) 
            call c_f_pointer( cptr=c_loc(this%m_fields(3)), fptr=pvals )
      end select
      return
   end function get_int32
   
   function get_int64( this ) result(pval)
      ! Argument list
      class(bitbucket_t), intent(in), target :: this
      ! Function result
      integer(I8), pointer :: pval
      call c_f_pointer( cptr=c_loc(this%m_fields), fptr=pval )
      return
   end function get_int64
   
end module
   use, intrinsic :: iso_fortran_env, only : int32, int64
   block 
      integer(int32) :: i32
      integer(int64) :: i64
      print *, "Block 1: using copy assignment with a type conversion function"
      i32 = -123456
      i64 = bit32_to_64( i32 )
      print "(*(g0))", "i32  = ", i32
      print "(*(g0))", "i64  = ", i64
      print "(g0,b0)", "i32 (bit pattern): ", i32
      print "(g0,b64)","i64 (bit pattern): ", i64
   end block
   print *
   block 
      use bitbucket_m, only : bitbucket_t, POS0
      type(bitbucket_t) :: bucket
      print *, "Block 2: using a custom type"
      bucket%int32( POS0 ) = -123456
      print "(*(g0))", "bucket%int32( POS0 )  = ", bucket%int32( POS0 )
      print "(*(g0))", "bucket%int64()  = ", bucket%int64()
      print "(g0,b0)", "bucket%int32( POS0 ) (bit pattern): ", bucket%int32( POS0 )
      print "(g0,b64)","bucket%int64() (bit pattern): ", bucket%int64()
   end block
contains
   ! copy an int32 to the first 32 bits of an int64
   elemental integer(int64) function bit32_to_64(i32) result(i64)
      integer(int32), intent(in) :: i32

      integer :: bit
     
      i64 = 0 
      do bit=0,31
          i64 = merge(ibset(i64,bit),ibclr(i64,bit),btest(i32,bit))
      end do
   end function
end
C:\temp>gfortran -Wall p.f90 -o p.exe

C:\temp>p.exe
 Block 1: using copy assignment with a type conversion function
i32  = -123456
i64  = 4294843840
i32 (bit pattern): 11111111111111100001110111000000
i64 (bit pattern):                                 11111111111111100001110111000000

 Block 2: using a custom type
bucket%int32( POS0 )  = -123456
bucket%int64()  = 4294843840
bucket%int32( POS0 ) (bit pattern): 11111111111111100001110111000000
bucket%int64() (bit pattern):                                 11111111111111100001110111000000

C:\temp>
1 Like

I agree with your observations, a derived type would be spot on if Fortran allowed an access operator like discussed here, that apparently not many care about…

1 Like

I notice in the bit32_to_64() function that there is a loop over the individual bits. Is that really the best that can be done with fortran? I would expect that to be a factor of 50 to 100 times slower than a solution that copies all the bits at once in a single instruction.

Regarding whether I should be using int8, int32, or int64 arrays to store the bits, that is partly determined by the legacy api. We have historically used either real*8 or integer*8 arrays to store the bits, depending on what was supported by the compilers of that time. We routinely used type punnng through dummy argument lists to switch back and forth as necessary. Some machines, such as CRAY, only supported 64-bit words, while others required address alignment of real*8 variables and arrays, so using real*8 or integer*8 arrays to store the bits solved all those problems regarding portability.

Now with modern fortran, I’m rewriting some of this low-level code to use just standard declarations and standard intrinsic operators.

The same post referenced with the quoted comment above also shows one other option that does not involve a loop, why are you not taking a look at that?

I only included the bit32_to_64() function, since it was mentioned upthread, for side-by-side comparison with the other option.

Yes, I did notice that. I’m already doing something similar with c_loc() and c_f_pointer() to simulate the legacy type punning, so I might consider that approach for the bit moving problem too. However, with this approach, there is still the big/little-endian addressing problem that arises when changing KINDs, right?

Well, if you have a big-endian machine, try out the code and adapt it as needed. At this point, it ain’t a discourse, damn it - it’s just do it!

Here, going by IBM’s write-up with

const int i = 1;
#define is_bigendian() ( (∗(char∗)&i) == 0 )

the following should work in Fortran:

   logical, parameter :: IS_BIG_ENDIAN = iachar( c=transfer(source=1,mold="a") ) == 0

So then the above bitbucket approach can likely be as follows instead:

! File p.f90
module bitbucket_m
   
   use, intrinsic :: iso_c_binding, only : I1 => c_int8_t, I4 => c_int32_t, I8 => c_int64_t, c_loc, &
      c_f_pointer
      
   private

   type, public :: bitbucket_t
   ! Field of 8-bit integers
      private
      integer(I1) :: m_fields(8) = 0_i1
   contains
      private
      procedure, pass(this) :: zap_fields
      procedure, pass(this), public :: int32 => get_int32
      procedure, pass(this), public :: int64 => get_int64
      generic, public :: zap => zap_fields
   end type bitbucket_t

   ! Ref: https://developer.ibm.com/articles/au-endianc/
   logical, parameter :: IS_BIG_ENDIAN = iachar( c=transfer(source=1,mold="a") ) == 0
   integer, parameter :: POS = merge( 3, 1, IS_BIG_ENDIAN )

contains

   subroutine zap_fields( this, idx )
      ! Argument list
      class(bitbucket_t), intent(inout) :: this
      integer, intent(in)               :: idx(:)
      ! Local variable
      if ( size(idx) > size(this%m_fields) ) error stop 
      if ( any(idx > size(this%m_fields)) ) error stop 
      this%m_fields( idx ) = 0_i1
      return
   end subroutine zap_fields
   
   function get_int32( this ) result(pval)
      ! Argument list
      class(bitbucket_t), intent(in), target :: this
      ! Function result
      integer(I4), pointer :: pval
      call c_f_pointer( cptr=c_loc(this%m_fields(POS)), fptr=pval )
      return
   end function get_int32
   
   function get_int64( this ) result(pval)
      ! Argument list
      class(bitbucket_t), intent(in), target :: this
      ! Function result
      integer(I8), pointer :: pval
      call c_f_pointer( cptr=c_loc(this%m_fields), fptr=pval )
      return
   end function get_int64
   
end module
   use, intrinsic :: iso_fortran_env, only : int32, int64
   block 
      integer(int32) :: i32
      integer(int64) :: i64
      print *, "Block 1: using copy assignment with a type conversion function"
      i32 = -123456
      i64 = bit32_to_64( i32 )
      print "(*(g0))", "i32  = ", i32
      print "(*(g0))", "i64  = ", i64
      print "(g0,32x,b0)", "i32 (bit pattern): ", i32
      print "(g0,b64)","i64 (bit pattern): ", i64
   end block
   print *
   block 
      use bitbucket_m, only : bitbucket_t
      type(bitbucket_t) :: bucket
      print *, "Block 2: using a custom type"
      bucket%int32() = -123456
      print "(*(g0))", "bucket%int32()  = ", bucket%int32()
      print "(*(g0))", "bucket%int64()  = ", bucket%int64()
      print "(g0,32x,b0)", "bucket%int32() (bit pattern): ", bucket%int32()
      print "(g0,b64)","bucket%int64() (bit pattern): ", bucket%int64()
   end block
contains
   ! copy an int32 to the first 32 bits of an int64
   elemental integer(int64) function bit32_to_64(i32) result(i64)
      integer(int32), intent(in) :: i32

      integer :: bit
     
      i64 = 0 
      do bit=0,31
          i64 = merge(ibset(i64,bit),ibclr(i64,bit),btest(i32,bit))
      end do
   end function
end

On a little-endian processor, the above yields:

C:\temp>p.exe
 Block 1: using copy assignment with a type conversion function
i32  = -123456
i64  = 4294843840
i32 (bit pattern):                                 11111111111111100001110111000000
i64 (bit pattern):                                 11111111111111100001110111000000

 Block 2: using a custom type
bucket%int32()  = -123456
bucket%int64()  = 4294843840
bucket%int32() (bit pattern):                                 11111111111111100001110111000000
bucket%int64() (bit pattern):                                 11111111111111100001110111000000

C:\temp>

Endianness will become a problem when you switch operating system or cpu architecture. More importantly, you may have problems with a Fortran solution regardless of endianness. From Fortran 16.3.1:

The interpretation of a negative integer as a sequence of bits
is processor dependent.

So, ibset, ibclr, btest may or may not do what you want with respect to the sign bit. The solution proposed upstream with Fortran OOP feature is overkill to say the least. Just use C interop.

/* ron1.c. */
#include <stdint.h>
/*
 * Type punning.
 */
typedef union bit64_t {
      uint64_t u;
      int64_t s;
} bit64_t;
 
typedef union  bit32_t {
      uint32_t u;
      int32_t s;
} bit32_t;
/*
 * Pack 2 signed 32-bit ints to a signed 64-bit int.
 */
int64_t
pack(int32_t hi, int32_t lo)
{
   bit64_t r;
   bit32_t x, y;
   x.s = hi;
   y.s = lo;
   r.u = (uint64_t)x.u << 32 ^ y.u;
   return r.s;
}
/*
 * Unpack a signed 64-bit int into2 signed 32-bit ints.
 */
void
unpack(int64_t x, int32_t *hi, int32_t *lo)
{
   bit64_t r;
   r.s = x;
   *hi = (int32_t)(r.s >> 32);
   *lo = (int32_t)((r.s << 32) >> 32);
}
program ron2

   use iso_fortran_env, only : int32, int64

   implicit none

   interface
      function pack(hi, lo) bind(c,name='pack')
         import int32, int64
         integer(int64) pack
         integer(int32), intent(in), value :: hi, lo
      end function
      subroutine unpack(x, hi, lo) bind(c, name='unpack')
         import int32, int64
         integer(int64), intent(in), value :: x
         integer(int32), intent(out) :: hi, lo
      end subroutine unpack
   end interface

   integer(int32) hi, lo
   integer(int64) j

   hi = int(z'01020304', int32)
   lo = int(z'05060708', int32)
   j = pack(hi,lo)
   write(*,'(Z16.16)') j
   hi = 0
   lo = 0
   call unpack(j, hi, lo)
   write(*,'(2Z8.8)') hi, lo

end program ron2
cc -c -O ron1.c
gfcx -o z -O ron2.f90 ron1.o && ./z
0102030405060708
0102030405060708
2 Likes

Re: “The solution proposed upstream with Fortran OOP feature is overkill to say the least” -

Not in the least. The solution proposal using the custom type (say bitbucket_t) once tailored to the needs of OP and used properly will help as follows:

  • no need for unpacking and the consumer can avoid a duplicate copy instruction,
  • is all in Fortran,
  • no direct need for bit processing
  • can be scaled to yield significantly greater performance if a consumer need to work with large arrays of integers.

Basically the code can be as simple as:

   use bitbucket_m, only : bitbucket_t
   ..
   type(bitbucket_t) :: bucket 
   ..
   bucket%hi() = int(z'01020304', int32)  ! store the source integer
   bucket%lo() = int(z'05060708', int32)  ! store the source integer
   ..
   ! consumer the high and low pairs
   write(*,'(2Z8.8)') bucket%hi(), bucket%lo()

@RonShepard was talking about packing 2 32-bit signed integers into a 64-bit signed integer, so your elegant solution does not address that. You show that you can put one 32-bit signed integer into a bucket and extract it as a 64-bit signed integer. You’ve wasted 32-bits. So why use a 64-bit bucket? Second, you are ignoring the fact the handling of the sign bit is processor depend. One may get what you expect, but it is not guaranteed. Third, the Fortran intrinsic functions ibset, ibclr, btest operate on a bit sequence defined by the model in F2018:16.3. 16.3 defines how the bit sequence is ordered, so endianness is somewhat irrelevant if one only manipulates the entities as bit sequences.

So the point here about a custom type and tailoring it to the needs is that the solution proposal can easily be adapted to be generic. Then if OP wants to store n (likely n=2) X-bit (likely X=32, but possibly other kinds also) integers in a bucket of n*X bits, it can be designed for that. There won’t be any waste of storage, what was shown upthread with storing only 32-bir integer in a 64-bit bucket was just a quick illustration that was only focused on a specific aspect of OP’s question in the original post and that was even though the questions in the first two posts by OP with MVBITS didn’t seem all that relevant with the other possible options with packing

Parameterized derived type (PDT) facility also can yield further convenience here toward the custom type (say bitbucket_t), too bad gfortran doesn’t support it whereas Intel Fortran and NAG do.

The big advantage of the custom type can be that none of the processor-dependent aspects with sign bit, etc. need apply, the solution can yield internal consistency and the same code should work with different OS and compilers. What one places in a bucket is exactly what one takes out, that is the basic idea.

In fact, if one considers a broader, generic need with different KINDs of integers as alluded by OP in a subsequent post, the suggestion to use C interop with much code in C using bit manipulation is indeed what is an overkill.

The point here is that an over-engineered solution, which one might be able to customize to suit some other purpose, has a downside. How many clock cycles does this take?

elemental integer(int64) function bit32_to_64(i32) result(i64)
      integer(int32), intent(in) :: i32
      integer :: bit
     
      i64 = 0 
      do bit=0,31
          i64 = merge(ibset(i64,bit),ibclr(i64,bit),btest(i32,bit))
      end do
   end function

Now, alter it to do the packing of 2 signed 32-bit integers into a signed 64-bit integer as @RonShepard’s original post indicated. The Fortran bit intrinsic functions operate on the bit model of Fortran 2018 16.3. Here, let me remind you what the Fortran 2018 says

The interpretation of a negative integer as a sequence of bits is
processor dependent.

The C-interop solution that I showed takes 72 clock cycles for packing and 80 clock cycles to unpack (including function call overhead) on my lowly Intel Core2 Duo processor, and it automatically takes care of the processor-dependent sign bit.

Although this seems to be a snide backhanded comment towards me, I’ll note that I don’t use PDT, so I have no reason to fix gfortran’s PDT implementation. You’ve been invited for years to be a part of the solution. Still waiting for the first patch from Vipul Parekh to show up in the fortran@gcc.gnu.org list.

Edited:
Don’t want to measure the print statement.

     t0 = rdtsc()
     i64 = bit32_to_64( i32 )
     t0 = rdtsc() - t0 
     print *,'i32->i64: ', t0

Everything compiled with -O3 -march=native.

gfcx -O3 -march=native -o z -I${HOME}/modules ff2.f90 ff1.o \
-L${HOME}/lib -lm90
% ./z
Block 1: using copy assignment with a type conversion function
i32->i64:   280

Repeated runs give 280-300 clock ticks. Not bad, but slightly larger than 72.

Duh!, bit32_to_64 is not part of the custom type approach with bitbucket_t suggested upthread, the one being termed “Fortran OOP solution proposal”.

bit32_to_64 was mentioned upthread earlier and a variant was indicated by @RonShepard with a question along the lines of whether that was the best Fortran had to offer! So after the Thanksgiving festivities and a heavy stomach while family got going with social media and TV and shopping plans, I spent a few minutes thinking of alternative options using Fortran and came up with the bitbucket_t option rather quickly.

And I only included the bit32_to_64 function in my post as a side-by-side comparison of the results and also to illustrate to those who can read that hey, such bit fiddling can have a performance impact whereas the other one that I showed with custom type and I listed under Block 2 in the example calling program will hardly take any extra cycles and yet give the same results.

There is general bias in certain circles that anything with OO is an overkill and that’s the only basis here too

With a generic need that any actual practitioner may have in mind in terms of the packing of integers of different KINDs in a “container” with convenient storage and retrieval at little cost, the option I show above with bitbucket_t or some such variant thereof is a better way to go. The C interop with bit fiddling with “<<” and union of unsigned and signed types of fixed sizes are all a total overkill.