How to write bytes in a binary file?

Which is the more standard compliant way of writing bytes (octets) in a binary file?

In C there is the unsigned char type that can be used to store bytes. But in Fortran integers are signed. If I use an integer with kind INT8, it should be OK for values between 0 and 127. But if I need to write values in the 128…255 range, problems begin… Moreover, the representation of negative values are processor dependent. Most of the time using two’s complement, but without guarantee.

I have the idea of using MVBITS(): I put the 0…255 value in an INT16, then transfer the 8 last bits in an INT8. Assuming of course the correct Endianness…
https://gcc.gnu.org/onlinedocs/gfortran/MVBITS.html

I could also create a function to find the binary digits and use IBSET() to set each bit in the INT8 integer.
https://gcc.gnu.org/onlinedocs/gfortran/IBSET.html#IBSET

Is there a simpler way? Am I missing something?

A discussion about unsigned integer can be read in this link.

1 Like

You can store unsigned integers [0 … 255] in Fortran simply in character variables:

character(len=1) :: uint8
uint8 = achar(255)

Or, use transfer() instead, if you need C interoperability:

use, intrinsic :: iso_c_binding, only: c_int8_t
integer(kind=c_int8_t) :: uint8
uint8 = transfer([255, 1], 1_c_int8_t)
character(len=1) :: uint8
uint8 = achar(255)

@interkosmos,

Could this keep the same kind, integer*1 or int8 (8 bits), as argument of the function achar in the interval [0,255]?

A bit of overthinking here. In memory, an INT8 integer is just a block of 8 bits. Unformatted stream I/O should threat them that way. The signed/unsigned distinction occurs when you perform arithmetic with the values, or convert them to a different KIND of integer.

1 Like

@interkosmos , @alozada
The TRANSFER proposition is interesting: “Transfer physical representation.” I will make a try: the representation of 255 in Fortran should be 00000000 11111111, or rather 11111111 00000000 on a little endian machine… Will I have 00000000 or 11111111 in my c_int8_t ?

The character solution could work on our machines, but I am not sure it is universal. The standard says:

“The processor defines a collating sequence for the character set of each kind of character. The collating sequence is an isomorphism between the character set and the set of integers{I: 0≤I < N}, where N is the number of characters in the set.”

But says nothing about the binary representation.

I have tested that program on a x86_64 machine:

program writing_bytes
    use ISO_FORTRAN_ENV, only: INT8, INT16
    implicit none
    integer(INT8)  :: i8, j8
    integer(INT16) :: i16
    character(len=1) :: c1
    integer :: status

    ! 129 = 0x81 = 0b10000001
    i8 = int(z'81', kind=INT8)
    print *, i8
    i16 = 129
    j8 = int(i16, kind=INT8)
    print *, j8

    c1 = achar(129)

    open(unit=1, file='bytes.bin', access='stream', status='replace', &
       & action='write', iostat=status)
    write(1, iostat=status) i8, j8, c1
    close(1, iostat=status)

    call execute_command_line("hexdump -C bytes.bin")
end program

This is the output:

$ gfortran writing_bytes.f90 && ./a.out
 -127
 -127
00000000  81 81 81                                          |...|
00000003

It does the job on my machine, both with int( , kind=int8) and achar(). Will it work on any processor?

Note also that if you write i8 = int(129, kind=INT8), gfortran complains:

Error: Arithmetic overflow converting INTEGER(4) to INTEGER(1) at (1). This check can be disabled with the option ‘-fno-range-check’:

Hi @vmagnin,

as I understand, the values [80,FF] are outside of the range.
The error Arithmetic overflow converting ... is describe in this link

Yes, 0x81 = 129 is purposely just out of range [-127; +128].
Concerning INT(A, KIND) the Fortran 2018 standard says:

Case (iv):If A is a boz-literal-constant, the value of the result is the value whose bit sequence according to the model in 16.3 is the same as that of A as modified by padding or truncation according to 16.3.3.The interpretation of a bit sequence whose most significant bit is 1 is processor dependent.

and:

16.3.3 Bit sequences as arguments to INT and REAL
1 When a boz-literal-constant is the argument A of the intrinsic function INT or REAL,
•if the length of the sequence of bits specified by A is less than the size in bits of a scalar variable of the same type and kind type parameter as the result, the boz-literal-constant is treated as if it were extended to a length equal to the size in bits of the result by padding on the left with zero bits, and
if the length of the sequence of bits specified by A is greater than the size in bits of a scalar variable of the same type and kind type parameter as the result, the boz-literal-constant is treated as if it were truncated from the left to a length equal to the size in bits of the result.

I think that A=129 is a positive 16 bits signed integer: 00000000 10000001, and with INT(A, KIND=INT8) is it truncated to the rightmost 8 bits: 10000001. And that’s what I want: I just want to write the octet 10000001 in the file, I don’t care that it can be interpreted and printed as -127 by my processor.

The 16.3 Bit model is the classical binary model used for positive values. (But “the interpretation of a negative integer as a sequence of bits is processor dependent”.)

@alozada

Could this keep the same kind , integer*1 or int8 (8 bits), as argument of the function achar in the interval [0,255]?

Pardon me?

@interkosmos

My question was about the argument of the function achar. Could this argument be defined only with 8 bits (integer*1 or INT8) in the range [0,255]?

achar() is defined as result = achar(i [, kind]) (with optional character kind of the result since Fortran 2003). The argument i should be a positve signed integer [0 … 255]. The type integer(kind=1) has the range -127 … 127. You can pass integer(kind=1) to achar() but negative values will be converted to positive (I don’t know if this behaviour is undefined in the language standard).