BITS type in F202Y

@sblionel

The standard library implements a bitset type, so it seems that there is a need for such a type.

I’m curious how this could be easily simulated using BOZ constants?

1 Like

I don’t think anyone is saying a bitset type isn’t useful, but the prevailing opinion is that its usefulness is not sufficient to warrant adding to the language, especially considering the upheaval it causes to large parts of the language and compilers. Yes, using the existing facilities can be awkward at times, but it’s manageable.

The BIT data type was implemented in Gould-SEL MPX Fortran in the late 1970s. A typical use was to declare an array of type BIT and equivalence it to an integer - e.g., from a tracking radar code

      INTEGER*4 DUMMYW
      BIT DUMMYB(32)
      EQUIVALENCE (DUMMYB,DUMMYW)

Then the BIT array elements behaved like logicals, e.g.

         IF (DUMMYB(30)) THEN
C
C           RESET XMIT ON BIT
C           -----------------
C           CALL TRS_TRS ( 0 )
C             INLINE
	             ZBM 05,ITX:CW1+03
C             ENDI
C
C           SET T_O_FLAG TO 30
C           -----------------
            T_O_FLAG=30
C
         ELSE
C
C           SET T_O_FLAG TO 31
C           -----------------
            T_O_FLAG=31
C
         ENDIF

Apologies for the embedded assembler - but that was the MPX world! Some of these MPX programs are still in use, but I agree with @sblionel - it is probably not worth the candle to do this in modern Fortran.

A bit data type would simplify a lot of common operations, from data compression to cryptography, in a portable way. However, I do understand the effort that a new data type presents in the language, from conversions to other types (assignments, transfer(), etc.), to storage sequence association, to i/o. We programmers have been asking for it for over 40 years now (maybe even longer), so I expect most of us have simply given up on the prospect.

3 Likes

I use arrays of integer(int8) as BITS arrays (for data compression), knowing that it does not follow Fortran standard, and that it relies on processors using two’s complement arithmetic. I guess that many of Fortran users use similar approaches , without knowing the “limitations”. Adding BITS to the standard would “solve/clarify” many of these situations.

2 Likes

+1

Exactly.

What this vote shows is the standard development process is not working for the practitioners of Fortran.

  1. too many things are not getting done because too many aspects are seen as insufficiently useful or not useful by a few people - six in this instance, but often by just one most influential non-voting member. More often than not, it is due to non-technical factors - fewer resources at compiler vendors like NAG who are spread too thin, burnt out, jaded, uninterested, inadequate budget, more interested in managed decline of Fortran, I-don’t-like-your-design-so-let-it-die, etc. are aspects among these non-technical factors that cannot be ruled out; if anything, they may be at play strongly.

  2. the practitioners of Fortran for long have been stuck at the crossroads. Invest more in it, or give up. Many choose the latter that further exacerbates the issues in 1 above.

  3. In order for the language to advance, Fortran really needs to transcend WG5 and J3. The development of the language needs to move to the practitioners and their champions. The only hope is the vision laid out by @certik and a platform + possible ecosystem around LFortran. The practitioners need to seriously band together and invest seriously with their time and effort and marketing and $$ in that space and see what can be done. The good things are that there are no barriers to entry on that front and good ideas are what move forward and there are visionary leaders like @certik willing to give the practitioners the time and attention to TEACH. Contrast that with the J3 and WG5: there are huge barriers to become voting members and without actual voting rights, it is useless. On top of it, so much of the crucial information and influence remains in the dark, in the hands of one of a few people who will in effect cancel you if they are called out strongly.

The entire notion of the claim upthread with the BITS type re: “the upheaval it causes to large parts of the language and compilers” is highly questionable, possibly even entirely bogus.

  1. The effect on compilers will be somewhere in between what it will take for the new enumeration type in Fortran 2023 and what it took for the derived type in Fortran 90 revision, with the effort being closer to the former in most compilers. By no means it is an “upheaval”.

  2. The effect on the standard is about 80% editorial along the lines of wordsmithing, etc. which can easily be handled if the effort on the document were crowd-sourced, out in the open with instructions that take less than 1/100th of the effort many put in with GSoC or on forums like this or on GitHub e.g., LFortran to guide newcomers. Someone like @jacobwilliams et al., can run through the effort with some guidance in a matter of a few weeks, if not days, and develop tools and workflow along the way that make future changes much, much easier. The remaining 20% of the task might involve some subtlety and insight and some gotchas and so forth. But it can be managed, and better so in an open standard development process where the learnings can be captured and they can then be reapplied in future work on the language. But this too is by no means an upheaval.

The current WG5 and J3 workflow is insanity in the form of doing the same thing in a mostly closed manner over and over again that then yields fewer and fewer results for the poor, persevering practitioners of Fortran.

2 Likes

This is also what is stated in the bitset module of the standard library, but it’s not clear to me where the two’s complement arithmetic is assumed. For, the bit manipulation functions/routines of Fortran are based on an integer model that is independant from the actual implementation, so as long as one use these routines, no assumption is made.

The only thing that assumes a specific implementation and that I can see is assuming that in the integer value 0, all bits are cleared (i.e. all bits are 0). This could be easily overcame with an initialization routine executing

do i = 0, bit_size(0)
    zero = ibclr(zero,i)
end do

My main uncertainty relates to the sentence in in the section 16.3.1 of the F2018 standard:

The interpretation of a negative integer as a sequence of bits is processor dependent.

The standard also specifies the following for the intrinsic ibset (section 16.9.94):

Result Value. The result has the value of the sequence of bits of I, except that bit POS is one. The model for the interpretation of an integer value as a sequence of bits is in 16.3.

Therefore, my interpretation is that ibits(-1_int8, 7) is processor-dependent. Note that I might be wrong.

The integer model for the bit manipulation routines is unsigned, and it doesn’t fit the model of the integer type. So, the interpretation of the integer values output by the bit manipulation routines is indeed processor dependant. However, my understanding is that as long as one doesn’t use the integer values per se, this should not be a problem, and that the bit manipulation world is consistent (if not, this would be a design flaw IMO).

This is somehow similar to what happens with the transfer function:

integer :: i
real :: a = 1.0
i = transfer(a,kind(i))
a = transfer(i,kind(a))

The interpretation of i is processor dependant, but the standard gurantees that a == 1.0 after the round-trip. The integer type is here just a container and we don’t need/want to interpret its content. The usage of the integer type as a container to simulate a BIT type looks similar to me.

But I might be wrong as well…

This is my issue/incertainty.
In one of my applications, I have a matrix mat of integer(int8) that contains compressed data. Values of mat might be negative or positive. Compressed elements correspond to two bits of the int8 integers.

Decompression is done using ibits, e.g.:

integer(int8), allocatable :: mat(:,:)
real, parameter :: decomp(0:3) = [....]
real :: tmp

...
tmp = decomp( ibits(mat(i,j), 6, 2) )
...

So, I think that this use must assume two’s complement arithmetic, especially as I use the last bit of possibly negative integers.

I agree, because you are effectively interpreting the content of the integer returned by ibits(). A portable (and more expensive) way could be:

tmp = decomp( foo(mat(i,j),6) )
 
integer(int8) function foo(b,pos)
integer(int8), intent(in) :: b
integer, intent(in) :: pos
foo = merge(1_int8,0_int8,btest(b,pos)) + &
      merge(2_int8,0_int8,btest(b,pos+1))
end function
1 Like

By the way, specifying an integer model for the bit manipulation routines looks pretty useless to me, as long as there is no integer type/kind in the standard that corresponds to this model.

Yes, I agree.
The standard is, at least, confusing to me, including for the bit intrinsics.
And @sblionel 's blog post on BOZ constants let me think that there are no standard-conformant ways to work with intrinsics and bits of a negative integer.

I think the most “fortrannic” way of supporting bits (assuming a new BIT type is not in the options) could just boil down to have compilers support a selected_logical_kind(bits=1) which is actually a 1-bit size, see page 9 at:

I think that among the current intrinsic types, a kind having a “storage size” that is not a multiple of a byte can hardly be standard conforming:

logical(kind=lk1), target :: a(1000)   ! 1 bit size logical
logical(kind=lk1), pointer :: p(:)

p => a(5:)   ! cannot work as a(5) is not addressable

This can have multiple consequences, and this is probably why an entirely different type is needed. This is the same with the bool type in C++:

#include <vector>
int main() {
std::vector<bool> a(100,true);
bool* p;

p = &a[5];
}

<source>: In function 'int main()':
<source>:6:9: error: taking address of rvalue [-fpermissive]
    6 | p = &a[5];
      |      ~~~^
<source>:6:5: error: cannot convert 'std::vector<bool>::reference*' to 'bool*' in assignment
    6 | p = &a[5];
      |     ^~~~~
      |     |
      |     std::vector<bool>::reference*

I understand what you mean. However, Fortran pointers are only loosely relatable to C/C++: they’re not only addresses, as they also store bounds/strides for when complex associations are made.

So, even though I haven’t deeply thought about the implications, I don’t think having bit-size logicals would be out of this world. But like you’re pointing out, it certainly wouldn’t be an interoperable kind.

True, nonetheless they use addresses under the hood. It would probably be possible to implement pointers that handle bit offsets, but pointers would not be the only languages element that would be affected. At the end, either the compiler would have to implement a lot of special cases for this particular logical kind, or drop some features that are not very useful and therefore the compiler would not be standard conforming.

1 Like

I don’t think addressibility is important here. This reminds me of how CRAY fortran supported character types in f77. It was a word addressible machine (64-bits in a word), yet it managed to store 8 characters in each word, and it supported all of the usual character substring operations. The complications were all below the high-level language level, so of course the compiled code didn’t look like operations on a byte-addressable machine, but it worked.

I think programmers would expect the same kind of addressing magic to occur for BIT expressions like a(first:last:inc).

Regarding the current integer bit operators, there are two kinds of machine-dependence that creep into the picture. The one that is being discussed in this thread is how negative integers are represented (e.g. twos-complement, ones-complement, signed-magnitude, biased, etc.). As long as the integer is just used to store bits, and the integer values are ignored, this capability is still useful. It is not as useful as a BIT data type would be, but programmers can write the surrounding code and continue to adapt as they have for several decades now. Remember, it wasn’t even until f90 that fortran had a standard set of integer bit operators to work with.

But the other one is the big/little endian addressing convention. Even though the fortran standard goes to detail in describing the integer model and how bits are addressed within it, and how integer values are determined from the bits, it does not specify how to transfer groups of bits from one integer kind into another. Thus, just like in f77 and in all previous versions of fortran, the programmer must manually adapt for the two possible addressing cases. The only way to write portable code is to operate on single bits at a time, not fields of bits, and that seems to me to be, depending on your disposition, either tragically or comically shortsighted within the language. A BIT data type would have solved that problem decades ago if adopted. Even in the current standard, if MVBITS() were allowed to have arguments of different integer KINDs, it would help the programmer tremendously, but even that tiny step toward portability is not allowed.

program tmvbits
   use, intrinsic :: iso_fortran_env, only: int8, int32
   integer(int8) :: i8 = 7  ! set a 3-bit field.
   integer(int32) :: i32 = 0
   call mvbits( i8, 0, 3, i32, 0 )   ! move that 3-bit field.
   write(*,'(b8.8,b33.32)') i8, i32
end program tmvbits

$ gfortran tmvbits.f90 && a.out
tmvbits.f90:5:26:

    5 |    call mvbits( i8, 0, 3, i32, 0 )
      |                          1
Error: ‘to’ argument of ‘mvbits’ intrinsic at (1) must be the same type and kind as ‘from’

Note that I do not say that a BITS type would necessarily be overcomplicated to support, just that going through a kind of the existing logical type doesn’t look fully appropriate to me.

At first glance, it doesn’t look different from the sign issue: as long as you don’t need/want to interpret the integer values that result from the bit manipulations, knowing the endianess is irrelevant. Wheter the bits 0-7 of an int32 are stored in the first byte or in a last byte doesn’t matter.

However, as you mention, the limitation of mvbits is a problem (and difficult to understand/justify).

Over the years, there have been several approaches proposed. One is a logical type array that has 1-bit entities. One would use array slice notation to address the substrings within the array.

Another model is based on the character data type. Suppose you have an electronic device that output 10-bit words, and you want to store 1Kb of output. One might define something like

BIT_TYPE(len=10) :: B(1024)

and then read from the device into that array. Once in the array, one would then typically need to move these bits into integers or reals and operate on the data, so there would need to be MVBITS() and/or TRANFER() type operations to do that. Maybe some assignment operations would be useful too, to keep the code simple. But it seems like this limited functionality could have been provided long ago, so I don’t know why it hasn’t been. Over time, once useful operations have been identified, the functionality could be enhanced. But the important part is just to avoid all the low-level offset and indexing work that is necessary now and which leads to obscure code, programmer errors, and runtime inefficiencies.

Suppose you have an API that specifies where the bits are supposed to go, then how can you put them there efficiently without knowing the addressing convention? Or you are trying to send bits to a device, say the 10-bit device in the above example. How do you put the bits in the right place without knowing the addressing convention? Right now, you have to do it one bit at a time, you cannot use MVBITS() or TRANSFER(), which seem like the obvious candidates to do things efficiently. A BIT_TYPE data type would of course be the simplest looking solution, and probably the easiest to program. That shifts the low-level programming effort from the application programmers onto the compiler writers, where in my opinion, it belongs.