Knuth's bit reversal algorithm

I’m trying to port Donald Knuth’s fast bit reversal algorithm from C to Fortran, as described in Chapter 7 of Hacker’s Delight, 2nd edition, by Henry S. Warren.

The original algorithm looks like this for 32 bits:

unsigned rev(unsigned x) {
  unsigned t;
  x = (x << 15) | (x >> 17); // Rotate left 15.
  t = (x ^ (x>>10)) & 0x003F801F; x = (t | (t<<10)) ^ x;
  t = (x ^ (x>> 4)) & 0x0E038421; x = (t | (t<< 4)) ^ x;
  t = (x ^ (x>> 2)) & 0x22488842; x = (t | (t<< 2)) ^ x;
  return x;
}

And the following for 64 bits:

unsigned long long rev(unsigned long long x) {
  unsigned long long t;
  x = (x << 32) | (x >> 32) // Swap register halves
  x = (x & 0x0001FFFF0001FFFFLL) << 15 | // Rotate left
      (x & 0xFFFE0000FFFE0000LL) >> 17;  // 15.
  t = (x ^ (x >> 10)) & 0x003F801F003F801FLL;
  x = (t | (t << 10)) ^ x;
  t = (x ^ (x >>  4)) & 0x0E0384210E038421LL;
  x = (t | (t <<  4)) ^ x;
  t = (x ^ (x >>  2)) & 0x2248884222488842LL;
  x = (t | (t <<  2)) ^ x;
  return x;
}

This is what I’ve come up with so far:

INTEGER FUNCTION bit_rev_knuth_32(i_,number_bits)
  IMPLICIT NONE

  ! Input argument
  INTEGER, INTENT(IN) :: i_, number_bits

  ! Intrinsic functions
  INTRINSIC :: IOR, IAND, IEOR, ISHFT
    
  ! Internal variables
  INTEGER :: x, t

  ! Copy input to x
  x = i_

  x = IOR( ISHFT(x,15), ISHFT(x,-17) )             ! Rotate left 15
  t = IAND( IEOR( x, ISHFT(x,-10) ), z'003F801F' ) ! t = (x ^ (x >> 10)) & 0x003F801F
  x = IEOR( IOR(  t, ISHFT(t, 10) ), x )           ! x = (t | (t << 10)) ^ x
  t = IAND( IEOR( x, ISHFT(x,-4) ),  z'0E038421' ) ! t = (x ^ (x >>  4)) & 0x0E038421
  x = IEOR( IOR(  t, ISHFT(t, 4) ),  x )           ! x = (t | (t <<  4)) ^ x
  t = IAND( IEOR( x, ISHFT(x,-2) ),  z'22488842' ) ! t = (x ^ (x >>  2)) & 0x22488842
  x = IEOR( IOR(  t, ISHFT(t, 2) ),  x )           ! x = (t | (t <<  2)) ^ x

  ! Return x shifted to the correct bit length
  bit_rev_knuth_32 = ISHFT(x,(number_bits-32))

  RETURN
END FUNCTION bit_rev_knuth_32

INTEGER(KIND=int64) FUNCTION bit_rev_knuth_64(i_,number_bits)
  IMPLICIT NONE

  ! Input argument
  INTEGER(KIND=int64), INTENT(IN) :: i_, number_bits

  ! Intrinsic functions
  INTRINSIC :: IOR, IAND, IEOR, ISHFT

  ! Internal variables
  INTEGER(KIND=int64) :: x, t

  ! Copy input to x
  x = i_

  x = IOR( ISHFT(x,32), ISHFT(x,-32) )                     ! Swap register halves
  x = IOR( ISHFT( IAND( x, z'0001FFFF0001FFFF' ), 15 ), &  ! Rotate left 15
           ISHFT( IAND( x, z'FFFE0000FFFE0000' ),-17 ) ) 
  t = IAND( IEOR( x, ISHFT(x,-10) ), z'003F801F003F801F' ) ! t = (x ^ (x >> 10)) & 0x003F801F003F801FLL
  x = IEOR( IOR(  t, ISHFT(t, 10) ), x )                   ! x = (t | (t << 10)) ^ x
  t = IAND( IEOR( x, ISHFT(x,-4) ),  z'0E0384210E038421' ) ! t = (x ^ (x >>  4)) & 0x0E0384210E038421LL
  x = IEOR( IOR(  t, ISHFT(t, 4) ),  x )                   ! x = (t | (t <<  4)) ^ x
  t = IAND( IEOR( x, ISHFT(x,-2) ),  z'2248884222488842' ) ! t = (x ^ (x >>  2)) & 0x2248884222488842LL
  x = IEOR( IOR(  t, ISHFT(t, 2) ),  x )                   ! x = (t | (t <<  2)) ^ x

  ! Return x shifted to the correct bit length
  bit_rev_knuth_64 = ISHFT(x,(number_bits-64))

  RETURN
END FUNCTION bit_rev_knuth_64

I have the following concerns:

  • Fortran does not have unsigned integers. I think the current translation can handle bit sizes up to 31 bits (for the 32-bit version) and up to 63 bits (for the 64-bit version). How can I handle bit lengths of exactly 32 bits and exactly 64 bits?
  • Would converting ISHFT intrinsics to SHIFTL and SHIFTR be beneficial for speed/optimization reasons? These two intrinsics are introduced in Fortran 2008, and MFE seems to favor them over ISHFT, but looks like not all compilers support them yet… notably NVIDIA HPC SDK, which I extensively use for its stellar OpenACC support.
  • Eventually, I would want to convert these into GPU kernels. I wonder what is the most efficient way to perform bitwise operations on the GPU… Right now I’m leaning towards OpenACC, since it’s the GPU programming model that I’m most familiar with, but I am willing to consider OpenMP target offload or CUDA Fortran too.

Any help would be appreciated.

1 Like

@kargl That’s an interesting trick, using a union of int32_t and uint32_t.
Btw, when compiling the sample program bar with gfortran 9, it complains with the following error message:

   18 |   i = int(b'11111111111111110101010101010101', c_int32_t)
      |          1
Error: Arithmetic overflow converting INTEGER(16) to INTEGER(4) at (1). This check can be disabled with the option ‘-fno-range-check’

which goes away as soon as I turn on that compiler flag. ifort 19 and nvfortran 21.5 doesn’t seem to care about it, and both compile bar successfully without any error messages.

PS. Why would you delete the post in 24-48 hours? I think it’s a creative way to solve the problem…

Generally the hardware shift and IAND/IEOR operations do not distinguish between signed and unsigned arguments. I would use SHIFTR ad SHIFTL instead of ISHFT. If the second argument of ISHFT is a constant, there should be no performance difference, but the SHIFTL and SHIFTR versions are clearer for the programmer to read (and get right). [There was, for a period of time, a BITS data type proposal in F2008. It was deleted before F2008 was published, but it got around the signed integer issue, and was designed for this sort of computation.] Finally, if the hardware has a bit-matrix-multiply instruction, you can arbitrarily rearrange the bits in a 64-bit object, including reversing the bits, in a single instruction. The modern Intel-based hardware lacks this instruction.

2 Likes