Hi Forts! I am a longtime lurker and first time poster. This is something I am writing for my blog. It is not quite complete but I felt like sharing it here. Hope its not too basic for this crowd!
There is a mysterious file with a magic number binary header. It is not known how this file was created. We wish to store the binary header data as a variable within a fortran program so that we can test other binary files for a match to this reference header. One alternative way to test the other files would be to keep the reference file around and readin the header into any type of a variable at the top of the program and use it as a reference. A neater solution is to encode the data into the source code of the program itself. A somewhat related exercise is to embed arbitrary binary data into an executable Linking a binary blob with GCC | Freedom Embedded but lets not digress.
Let us say that the header is the first 8 bytes. For demonstration, I created a random file of 64 bytes using
dd if=/dev/urandom of=random-file.bin bs=1 count=64 iflag=fullblock
but literally any file will do.
(If you use a larger file, for the next set of commands, you might want to trim the output to the first few lines by piping through head
)
The first step is to look at the binary using one of od/hexdump/xxd. Instead of showing us lots and lots of 0s and 1s that would cause your eyes to glaze over, these commands output a compact representation of the binary data in hexadecimal (or octal). This leads to our first problem. By default, these commands do not print the same thing for the same file:
od
od stands for octal dump and prints octals by default. It needs some options to print hexadecimal.
od -Ax -tx random-file.bin
000000 bd188db2 950361bd df635ec8 907d5cd8
000010 541eebca 3cc7968c 6b053c19 fcc91ed4
000020 3e5b2291 947f60dd d1f87cfd ffffc55c
000030 7613604a 60e26218 db09e09a e733906b
000040
Sidebar: od
is OG, first released in November 1971(I had linked to wikipedia page of od here but had to remove it to stay at the two link limit). It predates the Bourne Again shell and is also the reason for the inconsistency in bash’s do
loop syntax. The usual convention followed for constructs in bash is if ... fi
or case ... esac
etc. If not for the already existing od
command, do
loops would have been closed with od
instead of done
!
xxd
xxd random-file.bin
00000000: b28d 18bd bd61 0395 c85e 63df d85c 7d90 .....a...^c..\}.
00000010: caeb 1e54 8c96 c73c 193c 056b d41e c9fc ...T...<.<.k....
00000020: 9122 5b3e dd60 7f94 fd7c f8d1 5cc5 ffff ."[>.`...|..\...
00000030: 4a60 1376 1862 e260 9ae0 09db 6b90 33e7 J`.v.b.`....k.3.
hexdump
hexdump random-file.bin
0000000 8db2 bd18 61bd 9503 5ec8 df63 5cd8 907d
0000010 ebca 541e 968c 3cc7 3c19 6b05 1ed4 fcc9
0000020 2291 3e5b 60dd 947f 7cfd d1f8 c55c ffff
0000030 604a 7613 6218 60e2 e09a db09 906b e733
0000040
If you are well-versed with these tools you can probably see whats coming. I got lucky here and used xxd
at first and was able to do what I wanted to do relatively painlessly. If I had used one of the other tools with their default options I would have probably pulled out a non-negligible percentage of my hair before getting to a place of understanding.
One hexadecimal digit encodes four bits of data. od
printed 16 “words” of 4 bytes. The other two commands printed 4x8 = 32 words and each word is 2 bytes. And no two are alike! We will square away the output of hexdump
and od
at a later time. Right now, let us continue with xxd
's output and add some options to cleanly print the first 8 bytes of the file in hex
xxd -p -l8 random-file.bin
b28d18bdbd610395
Its the same was what we got previously with the spaces removed and keeping only the first 8 bytes. So far so good.
Now, we will use this hex inside fortran to generate the binary header using the transfer
intrinsic. Since it is only 8 bytes, we could use any 8 byte to store this. In the program below, I chose a scalar INT64 type and transfer
'ed the hex into it and also read the 8 bytes from the binary file into another variable of the same type. So we are testing the reference file with itself so it should of course pass.
program main
use iso_fortran_env, only: iwp=> int64
implicit none
integer(iwp) i,j
i = transfer(Z"b28d18bdbd610395",i)
open(unit=11, file="random-file.bin", access='stream')
read(11) j
close(11)
write(*,'(a, L2)') "Are they equal?: ", i==j
endprogram
The result of this program for me was:
Are they equal?: F
That didnt quite work as expected! Let us look at the binary representation of i
and j
to see if we can get a hint. Adding the following lines after reading in j
in the previous program
write(*,'(A,Z0)')"i: ",i
write(*,'(A,Z0)')"j: ",j
gave me
i: B28D18BDBD610395
j: 950361BDBD188DB2
Are they equal?: F
This is interesting! Of course, i
and j
look different. But, we see that the order of bytes is reversed (recall that one byte equals two hex digits). Variable i
is the same as what we assigned it. j
however has its bytes reversed. We have been victimized by Endianness!
Some fortran compilers come with an non-standard extension to convert between endianness when handling files. The program below opens the binary file assuming big-endian ordering:
program main
use iso_fortran_env, only: iwp=> int64
implicit none
integer(iwp) i,j
i = transfer(Z"b28d18bdbd610395",i)
open(unit=11, file="random-file.bin", access='stream', convert='big_endian')
read(11) j
close(11)
write(*,'(A,Z0)')"i: ",i
write(*,'(A,Z0)')"j: ",j
write(*,'(A, L2)') "Are they equal?: ", i==j
endprogram
with the output:
i: B28D18BDBD610395
j: B28D18BDBD610395
Are they equal?: T
Now this is exactly what we expected!
Platform details:
Win10 WSL2
gfortran --version
GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0