Reading a binary file

I have a file that, when I try to open in a text editor or my IDE, shows up as non-textual; thus, I cannot read it’s content.

However, my Fortran code reads this data in and writes out to it. It uses the unformatted form (status is old).

Is this file assumed to be in a standard Fortran format?
Is there any method of ‘seeing’ all the data in this file? In this, I could write a program to read in all the data in there; though I have no concept of structre, etc. I am more interested in seeing if I can convert the whole file somehow.

If your fortran program can read the data, you are in luck - you have a way to read the data. There is no other supported way to read it. Although it would be possible to write a program in another language, it would take work. There is a very nice piece at NCL: Reading binary data that will provide guidance.

There are some basic governing rules and limitations regarding unformatted and other binary files that you need to know about. A file may be a sequence of records, but these days on operating systems such as Linux or Windows, the file contains a sequence of bytes. Some of those bytes may represent metadata (such as record length, end-of-file marks, label bytes), and the rest are data bytes, i.e., the “payload”.

Unless you have a complete description of the contents of the file, you cannot read an unformatted Fortran file – you cannot even distinguish between metadata and data. You can guess guess, but that is risky.

If you have a program source that can read such files, you just need to use a compiler that is consistent with the unformatted file conventions that were used to produce the file, or use compatible compilers for reading and writing.

I would say that you must guess at the file contents, but if you know how the fortran i/o system writes binary files, then you can easily extract the metadata. Usually, each record has a header that contains the record length, and a trailer which also contains the record length. That information can be used to skip forwards and backwards over records without knowing the record contents. There may also be issues regarding big- and little-endian byte addressing, which if unknown might require some effort. In the 32-bit file system days, the header and trailer data were typically just a 32-bit integer, so one could extract those values with od or some similar low-level utility. Then there was a period in the 1990s where different compilers had different record structures. Then eventually the compilers all seemed to agree on a de facto standard, which I think consists of a 32-bit integer for small records and a 64-bit integer for larger records. To give an example, binary files written by ifort and gfortran use a common binary structure.


This is what I have been investigating, although this claimed standard is evolving.

The ifort / gfortran de facto standard has evolved to be only a 32-bit header/footer for records.
Infortunately, records larger than 2 Gigabytes are written as a sequence of (9 less than) 2 Gigabyte sub-records. The 64-bit header appears to be abandoned.
The Silverfrost compiler uses a varying header/footer length of 1 or 5 bytes, depending on the record length.It is now developing support for records larger than 2 gigabytes. (perhaps 1,5 else 13 bytes)

If anyone knows the header format for other compilers I would be interested in knowing their format.

I have found that emulating these Fortran unformatted sequential record formats can be done with access=‘stream’ file access. This has the added benefit of direct access via “pos=file_address”. By generating an in-memory table of record size and position, this allows for a variable length random access file structure, which improves the management of data in these binary files.

It does remain that if you do not know the data type and structure of these records, it is very difficult to read and use these files. Ideally, the Fortran code that generated the file is also provided.

Because the general problem as described above is difficult at best, it is very likely the program you
have is the best place to start. If it uses a package such as HDF5 or CDF, which are self-describing
file formats than you can actually convert your file(s) to text relatively easily, as described in the documentation for those packages. Otherwise, if your program can read and write from the files the
program “knows” the file structure and you can use the routines that do so to decipher the file.
How hard that is will vary a lot, depending on how complex the file structure is.

The other case where you can convert the file to text relatively easily is when the file is a type defined by a standard, such as an Adobe PDF file, a GIF file, etc; but it sounds like this is a custom format for this particular program(?).