Read out Fortran unformatted file in C

Hello Colleagues,
I have inherited some data from some Fortran legacy code. Since I don’t have much experience in Fortran, I am trying to make it readable in C.
I have tried to implement a solution given to a similar problem.

I know the data file is the double-precision array VArPolar(-1000:1000, 0:50*1000) and have attached screenshots two hex dumps (‘hd BaPolar’). My own compiler is little endian.


The first problem I have is that the original datafile is very big ~ 800 MB so that I keep getting smash stack error.
The second problem is that I don’t quite know how to parse the bytes so that I can read out the content.
If someone could make an educated guess from the hex files, I’d be grateful.

Regards

1 Like

The result of another hex dump (‘od -BaPolar’) is attached here.

I have no idea what VArPolar is, but based on the hexdumps it looks like you have pairs of a single-precision real and a 32-bits integer, both in little-endian. Possibly these files are meant to be read as direct-access, so blocks of fixed length without any indication of records. They should be relatively easy to read in C but the meaning of the numbers is something entirely different.

1 Like

What’s the exact size (in bytes) of the file? Direct access binary file (no record info, just data) containing just the array of the shape you gave, would be 800,416,008 bytes

Then, don’t create the array on the stack. In Fortran, you could use allocatable arrays.

In Fortran, you would read the values through direct/unformatted access, as Arjen said, or decode the hex data with format specifier z.

This is correct: 800,416,008 bytes. The other information I have is that is that the array VaArPolar(-1000:1000, 0:50*1000) represents values of cosine(theta) from -1 to 1 in steps of 0.001, and values of the variable ‘r’ from 0 to 50 in steps of 0.001.

I’d say more likely values of some function of cos(theta) (x-axis) and r (y-axis). The reading should be straightforward, be it in C or Fortran but the values are somewhat strange. A typical hex value from the dump, say 8480412e00000000, represents -5.3374109887996959E-287 if big-endian or 3.8341701404960092E-315 if little-endian.

Thanks for the pointers @interkosmos. However, I don’t have the original program anymore, only the data from it. I would have to write a Fortran program that does what you suggest. Since I don’t quite have the fundamentals to do that, can you point me to an example to get me started?

Yes, that is correct the values are a function of cos(theta) (x-axis) and r (y-axis). ‘cos(theta)’ and ‘r’ are the indices of the array. The values that you mention (-5.3374109887996959E-287)don’t sound unreasonable since for ‘r > 16’ the function effectively goes to zero.

The reading should be straight forward, but if I start from this solution what changes do I have to make to it to read the data in C?

I believe that a single double precision real is represented, rather than pairs of single-precision reals, though your remaining deductions are likely correct. I am confident of the meaning of the numbers, I just need to be able to read it in C.

So I just tried running this program directly:

/* Read Fortran unformatted file with record markers containing byte sizes */
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
int recn,recl,nin; char buf[0x1000]; long offset;
FILE *fil=fopen(argv[1],"rb");
recn=0; offset = 0;
do{
   nin = fread(&recl,4,1,fil); // prefix marker
   if(nin < 1)break;
   offset+=4;
   if(nin > 0x1000){
      fprintf(stderr,"Buffer is too small, need more than 4096 bytes\n");
      exit(1);
      }
   nin=fread(buf,1,recl,fil);
   fread(&recl,4,1,fil); // postfix marker
   recn++; printf("%4d %8d %12ld\n",recn,nin,offset);
   offset+=nin+4;
   }while(1);
fclose(fil);
}

And as an output I got stack smashing and a segmentation error

Try the following. I hope I haven’t make any typo/mistake.
This assumes that the file was written with the same endianness as it is being read.

#include <stdio.h>

// main array, cols and rows inverted to speed up reading
// (Fortran stores and saves, by default, arrays columnwise
static double VArPolar[50001][2001]; 

int main(int argc, char *argv[])
{
  int ir, ict;
  double val;
  char *filename="BaPolar";  // make sure the filename is right
  FILE *fp;

  if ((fp=fopen(filename,"rb")) == NULL) {
    fprintf(stderr, "error opening file %s\n", filename);
    return 1;
  }
  for (ir=0; ir<50001; ir++) {
    if (fread(VArPolar[ir],8,2001,fp) != 2001) {
      fprintf(stderr, "error reading row no. %d\n", ir+1);
      return 2;
    }
  }
  fclose(fp);
  // now the array is read
  // to access Fortran element (i,j) (-1000:1000,0:50000)
  // use VArPolar[j][i+1000]
  printf("%g %g\n",VArPolar[0][0],VArPolar[0][1]);
  return 0;
}
1 Like

Thanks @msz59. I have to keep my gratitude here brief, but that worked perfectly on the first try and is much appreciated.
I have one more question. Suppose the endianness of the machines are different, mine being little endian. I then have to make an addition to the code that does byte swapping. Can you show me what this should look like?

If you use GNU C, there are macros bswap_NN defined in byteswap.h, possibly optimized for special CPU instructions, see below. If not, one would have to do the swap manually, byte by byte.
Please note that one cannot use read values as doubles before the swap is done, if needed. Explanation here.

code with optional byteswapping
#include <stdio.h>
#include <stdint.h>
#include <byteswap.h>

// main array, cols and rows inverted to speed up reading
// (Fortran stores and saves, by default, arrays columnwise
static double VArPolar[50001][2001]; 

// set to 1 if endianness differs
static int NeedSwap=0;

void swap64(void *adr)
{
  uint64_t *p=adr;
  *p = bswap_64(*p);
}

int main(int argc, char *argv[])
{
  int i, j, ir, ict;
  char *filename="BaPolar";  // make sure the filename is right
  FILE *fp;

  if ((fp=fopen(filename,"rb")) == NULL) {
    fprintf(stderr, "error opening file %s\n", filename);
    return 1;
  }
  for (ir=0; ir<50001; ir++) {
    if (fread(VArPolar[ir],8,2001,fp) != 2001) {
      fprintf(stderr, "error reading row no. %d\n", ir+1);
      return 2;
    }
    if (NeedSwap) for (ict=0; ict<2001; ict++) swap64(VArPolar[ir]+ict);
  }
  fclose(fp);
  // now the array is read
  // to access Fortran element (i,j) (-1000:1000,0:50000)
  // use VArPolar[j][i+1000]
  printf("%g %g\n",VArPolar[0][0],VArPolar[0][1]);
  return 0;
}

Thanks again @msz59. That also worked on the first try. I will read the material you linked to and try to brush up on my programming as a show of appreciation.

@msz59, it seems that your earlier comment about the strangeness of the values was correct.

Some of the values I expected for VArPolar[j][i+1000] were:
VArPolar[2900][1000] = 103;
VArPolar[3000][1000] = 64;
VArPolar[3000][1000] = 38.8;

Whereas, using this solution, assuming the same endiannes I obtain:
VArPolar[2900][1000] = -9.2819E-168;
VArPolar[3000][1000] = -3.19823E-295;
VArPolar[3000][1000] = -1.60563E-267;

Assuming different endiannes, I obtain:
VArPolar[2900][1000] = -4.35932E260;
VArPolar[3000][1000] = -1.49665E-192;
VArPolar[3000][1000] = 1.59033E75;

The data is supposed to be double-precision and so each value should be 8 bytes. But there must a mistake somwhere since I expect:
-10 < (VArPolar[j => 2900][1000]) < 110.
Do you have any further suggestions?

you could loop over the entire read array (swapped or not) to see if it contains any values in the range you mention. Maybe it was saved row-wise (by using [implied-]do loops) instead of default column-wise order.

@msz59 What I did was write a simple program in Fortran to cross-check the outputs of the C program:

Program readit

Implicit None

Real(8), Dimension( -1000:1000, 0:50*1000 ) :: VArPolar

Integer :: i, j

Open( 10,FILE='BaPolar', FORM='unformatted')      

Read ( 10) VArPolar

Close( 10 )

write (*,*) VArPolar(0,4500) 

End Program readit 

This gave me correct results in the range that I was expecting here. However, whenever I failed to declare the real(kind) and instead simply wrote:

Real, Dimension( -1000:1000, 0:50*1000 ) :: VArPolar

Then I obtained strange values that were orders of magnitude too high or too low.
What does that imply for how the C program should be modified?

That cannot work, no surprise as this makes the array of default (typically 4-bytes) reals.

As for the above, I do not quite catch what is going on. Your sample Fortran program uses unformatted file which by default, at least in gfortran is made of records, with the binary (unformatted) data preceded and followed by the record length in bytes. These record lengths are (gfortran v.11) 4 bytes each! Consider your code slighthly modified to create the file instead of reading it:

writeit
Program writeit
  Implicit None
  Real(8), Dimension( -1000:1000, 0:50*1000 ) :: VArPolar
  Integer :: i, j

  call random_number(VArPolar)
  VArPolar(-1000,0) = 0d0
  VArPolar(1000,50000) = 0d0
  Open( 10,FILE='BaPolar', FORM='unformatted')
  write ( 10) VArPolar
  Close( 10 )
  write (*,*) VArPolar(0,4500)
End Program writeit

Hex dump of the file (first and last line) shows:

00000000  08 61 b5 2f 00 00 00 00  00 00 00 00 48 e2 10 3e
...
2fb56100  b2 44 97 3f 00 00 00 00  00 00 00 00 08 61 b5 2f

You can clearly see the 4-byte lenghts of record (0x2fb56108 = 800416008). The total file size is 800416016 bytes, not just 800416008 as you wrote before.
When I manually deleted the 4-byte lenghts from the beginning and the end of file, your readit program failed showing

At line 7 of file unform.f90 (unit = 10, file = ‘BaPolar’)
Fortran runtime error: I/O past end of record on unformatted file

But this maybe due to zeroes being interpreted as record length, so YMMV.

If your file indeed contains the record lengths (4-bytes), the failure of the C code would be obvious as it reads binary data without any record info, so it would make double values of two 4-byte parts coming from different original values (4-byte shift!). Still, it does not explain how your file gets properly read by Fortran program, if it really has 800,416,008 bytes only.

1 Like

I double checked and the file has indeed 800,416,016 bytes. I originally just went by array size instead of checking the file properties. Mystery solved. Can the C code be made to give the correct values?