Moving Bits Question

FortranFan · November 27, 2022, 2:28am

It may be worth an effort for you to propose something at the J3 Fortran GitHub site and include a fully illustrated use case for it and not just a few line code snippets and a lot of text.

If nothing else, such an exercise might help show the starting premise that “one required operation is to pack 32-bit integer values into an array of 64-bit integer containers” as claimed in the original post is likely not a required operation after all and that the language includes other, possibly better options for the overall task at hand and that MVBITS need not be changed.

urbanjost · November 27, 2022, 2:49am

As I read it the model represents an integer bit position from left to
right starting with zero so I thought the bit routines eliminated the
problems with sign and endian. So (without investigating efficiency)
I was going to say there are several portable ways to work with bits in
bucket arrays of various integer kinds; and that AFAIK this would be one
of several portable ways of concatenating two 32-bit integers into one
64-bit integer, but reading a little further (still not done perusing)
it looks like you all went way beyond that; but I did not see this one
mentioned so far …

program append
use,intrinsic::iso_fortran_env,only:int8,int16,int32,int64
implicit none
integer(kind=int64),allocatable :: result(:)

result= concat(&

  [int(b'01010101010101010101010101010101'),  &
   int(b'10101010101010101010101010101010'),  &
   int(b'00000000000000000000000000000001'),  &
   int(b'11111111111111111111111111111111'),  &
   int(b'10000000000000000000000000000000')], &

  [int(b'11111111111111111111111111111111'),  &
   int(b'00100100100100100100100100100100'),  &
   int(b'10000000000000000000000000000000'),  &
   int(b'11111111111111111111111111111111'),  &
   int(b'00000000000000000000000000000001')])

contains

elemental impure integer(kind=int64) function concat(ia,ib)
integer(kind=int32),intent(in) :: ia, ib
logical, parameter :: IS_BIG_ENDIAN = iachar( c=transfer(source=1,mold="a") ) == 0
integer,parameter  :: LSHIFT=merge(0,32,IS_BIG_ENDIAN)
integer,parameter  :: RSHIFT=merge(32,0,IS_BIG_ENDIAN)
   concat = dshiftl( shiftr(int(ia,kind=int64),RSHIFT), shiftl(int(ib,kind=int64), LSHIFT),32)
   write(*,'(*(b32.32))')ia,ib
   write(*,'(b64.64)')concat
end function concat

end program append

0101010101010101010101010101010111111111111111111111111111111111
0101010101010101010101010101010111111111111111111111111111111111
1010101010101010101010101010101000100100100100100100100100100100
1010101010101010101010101010101000100100100100100100100100100100
0000000000000000000000000000000110000000000000000000000000000000
0000000000000000000000000000000110000000000000000000000000000000
1111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111
1000000000000000000000000000000000000000000000000000000000000001
1000000000000000000000000000000000000000000000000000000000000001

Not really sure why you wanted to do that, but still reading. I guess you would still have to consider endian using int(), but I am not quite sure which way you want to do things with
different endianness. So this is elmental so it could do arrays and scalars.

FortranFan · November 27, 2022, 4:06am

And your thoughts on the splitting in a different context later to “unpack” the n X-bit (n=2, X=32) integers?

ivanpribec · November 27, 2022, 7:59am

I’m glad you are here to enlighten us. Perhaps the next time you’ll be resting after festivities, you can spend a few minutes drafting an article for the Learn Fortran pages at Learn — Fortran Programming Language, and share your knowledge on the best use of functions returning pointers with everyone in a less arrogant tone.

I could borrow your reasoning for many Fortran features, equivalence has been in the standard for over a dozen years, …, practitioners should consider getting on with it. implicit none has been in the standard for over a dozen years, practitioners should consider getting on with it. Implicit save has been in the standard for over a dozen years, statement functions have been in the standard for dozens of years, go to has been in the standard for over a dozen years. Just because something has been in the standard for over a dozen years, doesn’t automatically make it a meritable solution, and you out of all people will know that.

jeremie.vandenplas · November 27, 2022, 9:33am

kargl:

From Fortran 16.3.1:
The interpretation of a negative integer as a sequence of bits
is processor dependent.
So, ibset, ibclr, btest may or may not do what you want with respect to the sign bit.

@kargl I am a bit confused with this statement in Fortran 16.3.1.
E.g., with the intrinsic ibits, I would expect that ibits(val, 6, 2) always return the 2 same bits for an integer(int8) :: val read from a stream binary file, whatever the value of val is. For example:

program test_r
 use, intrinsic :: iso_fortran_env, only: int8
 implicit none
 integer :: un
 integer(int8) :: dummy

 open(newunit=un, file='test.stream', access='stream', action='read')
 read(un) dummy
 close(un)

 write(*,'(1b2.2)')ibits(dummy, 0, 2)
 write(*,'(1b2.2)')ibits(dummy, 2, 2)
 write(*,'(1b2.2)')ibits(dummy, 4, 2)
 write(*,'(1b2.2)')ibits(dummy, 6, 2)

end program

Example:

[test_ibits]$ xxd -b test.stream 
00000000: 10010001                                               .
[test_ibits]$ gfortran test_r.f90 
[test_ibits]$ ./a.out 
01
00
01
10

The file test.stream could be created by another program (even another language). What I need, it is to extract the different bits of the int8 value (each 2 bits having a different meaning). But the interpretation of the 8-bit sequence as read from the file (i.e. the value of the int8 integer) is not relevant to me, neither the interpretation of the 8-bit sequence returned by ibits (i.e. the value of the returned int8 integer; in this case always a positive integer).
Based on your warnings, should I understand that the results of the program above cannot be trusted (across different processors) if the interpretation of the 8-bit sequence in the file test.stream is a negative integer?

FortranFan · November 27, 2022, 2:12pm

And thank you very much also for “enlightening” with your several false equivalences, the state of La Résistance in Fortran with “on with the old, out with anything new” too remains strong with all the “learning” providing in your “meritable” commentary here

ivanpribec · November 27, 2022, 3:31pm

My point was that new users of Fortran are often surprised by many of the older features I mentioned, just like I was surprised by using a pointer-expression on the left-hand side of an assignment. Telling me this isn’t of any consideration, and to simply “get on with it” because the feature has been there for a dozen of years feels disparaging and is of no help.

Previously at Discourse

Richard Maine, editor of Fortran 95 and 2003 standards, has advised against writing functions that return pointers.

Who should I listen to now and why, @FortranFan or Richard Maine? Under what conditions is a solution that uses a function returning a pointer acceptable or perhaps even preferred? If you’d be willing to share this knowledge with us based upon your actual lived experiences I’d appreciate that.

FortranFan · November 27, 2022, 4:49pm

As always, what matters is the context.

Given the “too little, too late” plight afflicting Fortran due to which a built-in bit string type of facility is lacking and there are practitioners like OP who tend to ignore other facilities such as bitsets module in Fortran stdlib, the entire context here, starting from the original post, leads toward “clever code” to achieve a particular task that goes against much of the framework of type-rank conformance and that can necessitate type subversion/punning. It is then expected the applications like COLUMBUS where stuff like this may get into, the author(s) and consumer(s) who will deal with such “clever code” like to play with “fire” and they can be left to their devices to deal with the hazards.

Under such circumstances that then require going against the grain and employing some specialized facility in the language is where functions that return pointers can come into play. And it was for such situations also that the Fortran 2008 standard revision introduced the extension to consider a reference to a pointer function as a variable and permitted it to appear in any variable-definition context.

In N1891 WG5 document, John Reid writes:

The advice by Richard Maine predates the implementation of the Fortran 2008 in compilers where there were inadequate guardrails in the language toward functions returning pointers from appearing in variable definition contexts and the use of such functions with Fortran 90 thru’ 2003 conformant processors resulted in considerable vulnerabilities in programs.

As a general good coding practice though, it will be always be highly advisable to follow Richard Maine’s good words and avoid functions that return pointers and consider simpler, clearer approaches.

However that need not be seen as some kind of programming “theological” commandment - “thou shalt not program functions that return pointers, …”.

As explained above, my take is it is ok now to be more flexible and use other modern Fortran facilities that have standard semantics when the circumstances are such.

RonShepard · November 27, 2022, 5:20pm

I also remember these warnings from Richard Maine. The basis is that it is very easy to have dangling pointers (pointers that no longer are associated with the memory that they once pointed to) and memory leaks (anonymous memory that was only accessible through a pointer, which later becomes redefined or undefined).

As for functions on the left hand side of an expression, I think the “normal” way fortran programmers would think of this is

apt => function_returning_pointer()
apt = expression

Putting the function on the left hand side of the expression just eliminates the references to the middle man, apt. However, as Richard Maine warned, the function pointer value only lives as long as the expression in which it is accessed, so it is fraught with peril. The pointer apt allows any necessary memory deallocations to be done later if necessary, and even “better” in this respect if the pointer definition is done with a subroutine rather than with a function in order to avoid these kind of problems in the first place.

RonShepard · November 27, 2022, 5:27pm

I have always wondered what exactly that meant too. Like you, I think it means only that the association of the sign bit to the signed integer values is not defined by the fortran standard. An example might be like a parity bit in data communications. Negative values could be defined as words with odd parity and positive values to those with even parity. In this case, the sign bit itself is not what one would normally expect regarding the integer value, but the results of the bit operators would still work as they are defined to work.

RonShepard · November 27, 2022, 5:49pm

I would say that “clever code”, like beauty, is often in the eyes of the beholder. One might just as easily claim that a solution that involves function pointers on the left hand side of expressions is also “clever code”, fraught with peril due to the possibility of memory leaks and dangling pointers, and anyone who proposes such a solution can be left to their devices to deal with the hazards.

I posted the original question to more fully understand exactly what are the hazards and benefits of the various approaches. I expected that the modern fortran constructs could be used to write clear and portable code and to replace our legacy code. I must admit that I am surprised at the difficulty to achieve those goals for this seemingly simple task, as evidenced by the many proposed, varied, (and helpful, I should say) approaches that have been suggested in this discussion.

FortranFan · November 27, 2022, 6:55pm

Again, if you were to provide a better illustration of the context, say a minimal working example of what exactly is being pursued and why, chances are the starting premise will become moot, “one required operation is to pack 32-bit integer values into an array of 64-bit integer containers.” and better ways to approach and design the larger aspects with modern Fortran will become evident. And the rest of the discourse here will be witter.

There is a lot of textual description in the original post and the one that immediately followed but details are missing. It comes across as nonstandard steps were taken in the 70s and 80s toward instructions the language didn’t support: the language still doesn’t support the operation of interest directly but how to now replace the nonstandard code with modern Fortran syntax!? That makes little sense in the larger scheme of things now that the readers in the discourse have had their fill of being humored.

FortranFan · November 27, 2022, 7:18pm

And so, yes the derived type approach shown upthread with bitbucket_t is indeed a case of “clever code”, that was my point. Clearly it’s something one should ordinarily eschew but the context here is anything but ordinary Fortran.

Now the aspects of the “clever code” with this container type include not only an avoidance of direct bit fiddling and an extra copy with unpacking but also avoiding perils with the pointer function. The derived type with a private and explicit-shape component helps with this. There is a lot of thought I put into it in very quickly.

If one must do such a thing with storing integers of kinds in a container for later retrieval, the approach with bitbucket_t type ain’t gonna “byte” you, it’s about as safe as most things in Fortran, no need to cry wolf here.

urbanjost · November 27, 2022, 8:03pm

Well, originally I was thinking a function-based approach might look like the following, but
after pondering the rest of the discussion I have started to reconsider (everything!)

I was going to do some comparison timings on function versus subroutine and the other methods
discussed here; and it got me curious about what the real instructions generated are; had not really thought out or tested the endian issues (It has been a while, but there are more than two types)
but I was just playing and the discussion raised questions on this approach. Your original point
about how these issues would actually be thrashed out more under other circumstances certainly is hitting home after seeing the thrashing here just discussing it all. It worked on my platform
and I admit to doing a little “golf coding” here as well, but it is a complete program (probable bugs and all) so …

program append
use,intrinsic::iso_fortran_env,only:int8,int16,int32,int64, real64
implicit none
character(len=*),parameter :: g='(*(g0,1x))'
integer(kind=int64) :: clicks(2), count64, count_rate64
integer(kind=int64),allocatable :: result(:)
integer(kind=int32),allocatable :: ia(:), ib(:), i32(:)
integer :: i,isz
real(kind=real64) :: time_read
doubleprecision ::t0
   call system_clock(count64, count_rate64 )
   print g,'COUNT RATE FOR INT64:',count_rate64

   ia=  [int(b'01010101010101010101010101010101'),  &
         int(b'10101010101010101010101010101010'),  &
         int(b'00000000000000000000000000000001'),  &
         int(b'11111111111111111111111111111111'),  &
         int(b'10000000000000000000000000000000')]
   ib = [int(b'11111111111111111111111111111111'),  &
         int(b'00100100100100100100100100100100'),  &
         int(b'10000000000000000000000000000000'),  &
         int(b'11111111111111111111111111111111'),  &
         int(b'00000000000000000000000000000001')]

   write(*,g)'concatentate'
   call system_clock(clicks(1))
   result= concat(ia,ib)
   call system_clock(clicks(2))
   write(*,'(2(b32.32),/,b64.64)')(ia(i),ib(i),result(i),i=1,size(ia))
   call timeit()

   write(*,g)'bifurcate'
   call system_clock(clicks(1))
   i32=[( bifurcate(result(i)),i=1,size(result) )]
   call system_clock(clicks(2))
   write(*,'(2(b32.32),/,b64.64)')(i32(i*2-1:i*2),result(i),i=1,size(result))
   call timeit()

   isz=size(i32)
   write(*,g)'compare odd elments of i32 to ia', all(i32(1:isz:2).eq.ia)
   write(*,g)'compare even elments of i32 to ib',all(i32(2:isz:2).eq.ib)
contains
subroutine timeit()
   write(*,g)'clicks:',clicks(2)-clicks(1)
   time_read=(clicks(2)-clicks(1))/real(count_rate64,real64)
   write(*,'(a,g0,1x,a)') 'time : ', time_read, ' seconds'
end subroutine timeit

elemental integer(kind=int64) function concat(ia,ib)
integer(kind=int32),intent(in) :: ia, ib
logical, parameter :: IS_BIG_ENDIAN = iachar( c=transfer(source=1,mold="a") ) == 0
integer,parameter  :: LSHIFT=merge(0,32,IS_BIG_ENDIAN)
integer,parameter  :: RSHIFT=merge(32,0,IS_BIG_ENDIAN)
   concat = dshiftl( shiftr(int(ia,kind=int64),RSHIFT), shiftl(int(ib,kind=int64), LSHIFT),32)
end function concat

function bifurcate(i64)
integer(kind=int64),intent(in) :: i64
integer(kind=int32)            :: bifurcate(2)
logical, parameter :: IS_BIG_ENDIAN = iachar( c=transfer(source=1,mold="a") ) == 0
integer,parameter  :: LSHIFT=merge(32,0,IS_BIG_ENDIAN)
integer,parameter  :: RSHIFT=merge(0,32,IS_BIG_ENDIAN)
   bifurcate = int([ ibits(i64,RSHIFT,32), ibits(i64, LSHIFT,32)],kind=int32)
end function bifurcate

end program append

results from gfortran and ifort on my laptop …

$ fpm run --profile release --compiler ifort
fpm: Entering directory '/home/urbanjs/venus/V600/github/fun/bits2'
Project is up to date
COUNT RATE FOR INT64: 1000000
concatentate
0101010101010101010101010101010111111111111111111111111111111111
0101010101010101010101010101010111111111111111111111111111111111
1010101010101010101010101010101000100100100100100100100100100100
1010101010101010101010101010101000100100100100100100100100100100
0000000000000000000000000000000110000000000000000000000000000000
0000000000000000000000000000000110000000000000000000000000000000
1111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111
1000000000000000000000000000000000000000000000000000000000000001
1000000000000000000000000000000000000000000000000000000000000001
clicks: 1
time : .1000000000000000E-05  seconds
bifurcate
0101010101010101010101010101010111111111111111111111111111111111
0101010101010101010101010101010111111111111111111111111111111111
1010101010101010101010101010101000100100100100100100100100100100
1010101010101010101010101010101000100100100100100100100100100100
0000000000000000000000000000000110000000000000000000000000000000
0000000000000000000000000000000110000000000000000000000000000000
1111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111
1000000000000000000000000000000000000000000000000000000000000001
1000000000000000000000000000000000000000000000000000000000000001
clicks: 6
time : .6000000000000000E-05  seconds
compare odd elments of i32 to ia T
compare even elments of i32 to ib T
fpm: Leaving directory '/home/urbanjs/venus/V600/github/fun/bits2'
urbanjs@mercury:~/github/fun/bits2/app$ fpm run --profile release --compiler gfortran
fpm: Entering directory '/home/urbanjs/venus/V600/github/fun/bits2'
Project is up to date
COUNT RATE FOR INT64: 1000000000
concatentate
0101010101010101010101010101010111111111111111111111111111111111
0101010101010101010101010101010111111111111111111111111111111111
1010101010101010101010101010101000100100100100100100100100100100
1010101010101010101010101010101000100100100100100100100100100100
0000000000000000000000000000000110000000000000000000000000000000
0000000000000000000000000000000110000000000000000000000000000000
1111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111
1000000000000000000000000000000000000000000000000000000000000001
1000000000000000000000000000000000000000000000000000000000000001
clicks: 851
time : 0.85099999999999998E-6  seconds
bifurcate
0101010101010101010101010101010111111111111111111111111111111111
0101010101010101010101010101010111111111111111111111111111111111
1010101010101010101010101010101000100100100100100100100100100100
1010101010101010101010101010101000100100100100100100100100100100
0000000000000000000000000000000110000000000000000000000000000000
0000000000000000000000000000000110000000000000000000000000000000
1111111111111111111111111111111111111111111111111111111111111111
1111111111111111111111111111111111111111111111111111111111111111
1000000000000000000000000000000000000000000000000000000000000001
1000000000000000000000000000000000000000000000000000000000000001
clicks: 7005
time : 0.70049999999999998E-5  seconds
compare odd elments of i32 to ia T
compare even elments of i32 to ib T

An interesting footnote is how something very similar is handled with complex values and their components. A complex value is two real values, but you can define and extract the components individually, and refer to them as a%re and a%im; which might be an interesting model to emulate.

jeremie.vandenplas · November 27, 2022, 8:06pm

Thanks you for this explanation. however, I still don’t understand why intrinsics like mvbits or ibitsare dependent on the interpretation of a sequence of bits. So, irrespective of the sign of an integer int8 variable val, should ibits(val, 7, 1) be always considered as untrusted? See another example in my previous post?

msz59 · November 28, 2022, 10:06am

I might be wrong but reading further into section 16.3 of the Standard, one sees:

5 Effectively, this model defines an integer object to consist of z bits in sequence numbered from right to left from 0 to z − 1. This model is valid only in the context of the use of such an object as the argument or result of an intrinsic procedure that interprets that object as a sequence of bits. In all other contexts, the model defined for an integer in 16.4 applies.

The model being referred to is given in the same section: j=\sum_{k=0}^{z-1} w_k \times 2^k - apparently treating any sequence of bits as a non-negative integer value. So, IMHO, all bit-related intrinsics should work fine on all bits of an integer. Point 3 of the same section, mentioned upthread:

3 The interpretation of a negative integer as a sequence of bits is processor dependent.

means, as I guess, only that the Standard does not say anything about the model of the integer values, be it one’s or two’s complement or whatever. Section 16.4 seems to confirm this interpretation, giving a very universal model of an integer to be used in other, ie. not bit-sequence-related contexts:
i=s\times\sum_{k=0}^{q-1} w_k \times r^k with s=\pm 1

RonShepard · November 28, 2022, 6:52pm

I think this is the correct explanation of how the bit operators work. The standard goes to all this effort to define this bit model, and then it falls short by not defining how the bits of one integer kind match with another integer kind. All of the bit operators require arguments of the same kind, which is a reasonable restriction for most of the operators. But mvbits() seems like it should be more general in this respect, and allow bits of one kind to be moved into the bits of another kind. The one operator that does take arguments of different types and kinds is transfer(), which also sidesteps the issue of how bits are moved between arguments. So in the end, even with all of this effort to define the bit representations, programmers are still faced with the same old machine-dependent byte-ordering issues that we had to face since the pre-f90 days. It seems to me like this is a missed opportunity for the language and for those wanting to write portable code.

FortranFan · November 28, 2022, 11:14pm

To @ivanpribec , cc: @RonShepard

Since you gave a like to the post, can we presume you understand the issue and the point being raised by @RonShepard in this thread? If that is true, can you please illustrate it as per your understanding, preferably using code that you have run on different platforms including those with different order of byte sequences?

So you have the C interop approach shown upthread: and the program output shown there where the hexadecimal output is the same.

Now consider the TRANSFER intrinsic:

   use, intrinsic :: iso_fortran_env, only : int32, int64
   integer(int32) :: i32(2)
   integer(int64) :: i64
   i32(2) = int(z'01020304', int32)
   i32(1) = int(z'05060708', int32)
   i64 = transfer( source=i32, mold=i64 )
   write(*,'(*(b32.32))') i32(2), i32(1)
   write(*,'(b64.64)') i64
end

C:\temp>gfortran p.f90 -o p.exe

C:\temp>p.exe
0000000100000010000000110000010000000101000001100000011100001000
0000000100000010000000110000010000000101000001100000011100001000

So now put yourself in the shoes of someone on the standards committee who is looking at suggestions for changes in the language and they come across this on MVBITS: what exactly is the problem your think that committee member should ask the committee to address as needing to be solved in the language?

What is “the missed opportunity” here?

Sure, going from a processor with the above byte-ordering sequence to another processor with another sequence, the pattern of bits will be different. But how is that of any consequence?

In a given program with a processor, a round-trip TRANSFER operation or the pack/unpack steps will show no bit-hysteresis. So what exactly is the problem here?

macneacail · November 28, 2022, 11:15pm

It is interesting to follow this discussion. I have been cursing this stuff from a data stream, the challenge is not to put together or take apart the bit streams, the challenge is the data is not guaranteed correct, as in a lot of pure garbage is in the stream. 1 line of code and 30 lines to check the data before you attempt to use it.
The manufacturer gives you great help like, it will count to FFFFFFFF in about 30 hours.

Good luck.

RonShepard · November 29, 2022, 3:05am

This is a good point regarding twos-complement integers, and I thought about doing this because almost all hardware these days does in fact use twos-complement integers. I think it also works for ones-complement integers, that is the sign bit gets propagated during the conversion in the same way. However, it would not be true for signed-magnitude integers. In this case, the low-order bits would be correct, but the sign bit in i32 would get shifted up from bit 31 to bit 63, leaving bit 31 in i64 incorrect. I have used signed-magnitude integers in the past, but not in the last three decades.

I think I will end up just biting the bullet and accounting explicitly for the addressing convention of the hardware. I can still use the modern fortran bit operators, so that alone does clean up my old legacy code, but it is a disappointment that I cannot simply use the fortran integer bit model throughout to write simple, clean, portable code.

Topic		Replies	Views
How to write bytes in a binary file? Help	11	2606	March 1, 2021
Unsigned integer data type in modern Fortran	4	2387	May 25, 2021
How to perform sign-extension? Help	17	841	July 15, 2023
Bit manipulation in fortran Help	11	621	March 1, 2024
Initialize 64-bit integer variables with 64-bit real values Help	7	1978	March 11, 2022

Moving Bits Question

Related topics