How to read a mixed formatted/binary file?

I ran into a similar problem quite a few years ago, with a picture file format. That was actually before Fortran 2003 and readinf the file was a bit tricky. But with stream access it is not all that difficult. I took the example and experimented a bit. Here is the result:

! mixed.f90 --
!     Try to make sens of the file "unv58b_example.txt"
!
program mixed
    implicit none

    character(len=80)    :: line
    integer              :: i
    real, dimension(200) :: r
    integer              :: pos

    open( 10, file = "unv58b_example.txt", access = 'stream', form = 'formatted' )

    do i = 1,13
        read( 10, '(a)' ) line
        write( *, '(i5,3a)' ) i, ': >', trim(line), '<'
    enddo

    inquire( 10, pos = pos )
    close( 10 )

    open( 10, file = "unv58b_example.txt", access = 'stream' )

    read( 10, pos = pos ) r

    write( *, '(10g14.5)' ) r
end program mixed

The output to the screen is:

    1: >    -1<
    2: >    58b     1     2          11         420     0     0           0           0<
    3: >OA-level for xxxxxx<
    4: >NONE<
    5: >19-Sep-22 10:54:02<
    6: >Record    1 of section "XXX"<
    7: >Tracked processing\Fixed sampling\Runup\Sections\Overall level<
    8: >    2         0    0         0 NONE               1   0 NONE               0   0<
    9: >         2       105         0  0.00000e+00  0.00000e+00  0.00000e+00<
   10: >        19    0    0    0 X-axis               rpm<
   11: >         1    0    0    0 40                   Nm<
   12: >         0    0    0    0 NONE                 NONE<
   13: >         0    0    0    0 NONE                 NONE<
    860.71        314.94        867.28        320.57        878.02        337.36        883.78        351.17        890.07        371.18    
    900.99        423.69        907.90        484.06        921.36        2018.2        932.25        2257.5        941.16        2442.2    
    950.25        2618.1        961.92        2821.1        970.90        2951.6        979.06        3046.4        992.94        3151.5    
    999.02        3179.7        1010.4        3219.9        1019.9        3249.8        1031.7        3291.1        1041.3        3328.7    
    1050.2        3368.3        1059.4        3415.1        1068.1        3460.1        1079.9        3524.9        1086.0        3559.8    
    1098.2        3628.7        1108.2        3683.5        1117.1        3733.1        1127.3        3787.8        1139.8        3857.0    
    1149.4        3906.5        1159.9        3961.7        1169.6        4012.3        1179.8        4067.7        1189.9        4123.0    
    1196.9        4162.3        1210.0        4236.5        1216.7        4275.3        1227.2        4334.6        1236.7        4389.5    
    1243.4        4427.5        1259.9        4524.3        1263.1        4542.6        1276.3        4617.9        1283.1        4654.7    
    1297.0        4729.2        1304.2        4765.5        1318.6        4839.2        1326.1        4876.1        1337.4        4934.2    
    1347.6        4987.2        1355.1        5026.5        1369.0        5101.1        1369.2        5102.1        1383.5        5179.7    
    1397.9        5257.0        1405.3        5295.8        1412.7        5333.9        1428.0        5410.7        1436.0        5448.9    
    1444.8        5489.6        1452.9        5525.5        1461.7        5562.8        1480.3        5632.7        1490.0        5665.9    
    1500.3        5699.2        1510.8        5731.0        1510.9        5731.2        1521.5        5761.9        1542.9        5819.6    
    1543.1        5820.0        1554.2        5848.3        1565.6        5876.1        1576.7        5902.2        1587.8        5927.9    
    1588.2        5928.8        1610.2        5977.0        1610.4        5977.5        1621.4        6001.0        1632.2        6023.5    
    1642.5        6044.3        1652.5        6064.4        1662.4        6083.8        1671.6        6101.6        1680.5        6118.6    
    1696.9        6148.5        1704.3        6161.6        1711.5        6174.1        1718.0        6184.9        1734.2        6211.7    
    1738.4        6218.5        1748.1        6231.9        1757.9        6228.6        1772.0        6162.4        1782.5        6123.8    
    1782.6        6123.4        1799.8        6051.8        1808.7        6008.3        1821.8        5934.4        1835.0        5850.1    

So, I would say, the program is a nice start to parse these files. Of course, I have no clue as the contents :slight_smile: and you will have to expand the reading of the first part. But the numbers seem reasonable enough.

yes; and @arjen shows just that approach; but it would need expanded a bit to read multiple datasets and to handle the possible variation in line terminators, particularly multi-character
ones. I suppose the datasets have grown immensely over the years. I was slightly familiar with the format doing some CAD/CAM-ish work with something called ICEMDDN in the past; and I thought the basic 80-column max ASCII only files were very specifically meant to be portable and human-readable so surprised to see binary data in universal file format files :slight_smile:

I assume they appeared to help reduce size and increase efficiency as datasets got bigger and bigger.

Well, some variant of stream I/O looks like it will work.

With the information you provided about the sizes and type being contained in the headers I was thinking a “58b to 58” filter might be the simplest answer, but that eliminates the dataset size and potential efficiency gains of the semi-binary format.

As far as I know, the fortran standards have always restricted i/o to a specific file to be either formatted or unformatted. Specifically, writing to a file with

write(n,'(a,i0)') 'formatted', k
write(n) 'unformatted', k

is not allowed.

Maybe that restriction should be eliminated by adding something like form='mixed'? If it were eliminated, then these kinds of tasks, which includes reading things like spreadsheet files directly with fortran i/o, would be much easier.

I would go for that (an option on OPEN) If the only affect was that the A descriptor read and wrote the specified number of bytes required by the argument.

I was thinking about an ADVANCE=‘binary’ option or a modifier for the A descriptor like BA.

Fortran works fine at writing such a file using the A descriptor now with plain formatted sequential files. With a set of values whose binary representation does not include a system newline Fortran works fine at reading it to.

As the OP has mentioned it has efficiency and file size benefits. I wonder how many use cases there are for that? The very early FORTRAN compilers I know of that were pre-standardization use A just like that, basically writing the data “Asis”. I was looking through the standard trying to determine if it still should; or if at least should with PAD=‘NO’ and ADVANCE=‘no’ or some other I/O option. I still see mention of “effective character” I/O but so far haven’t found a phrase to allow or disallow an A descriptor from reading and writing binary streams if PAD=‘NO’ but I suspect there is something in there about what defines a line that prevents it.

If the binary data does not include a newline byte, as when the array is all ones in the following example, everything works just like the OP would like. Seems pretty simple to create such an option; just whether there is sufficient demand for it seems a bigger question. Not converting to ASCII is faster, more compact and more accurate of course (which is why my scratch files are always binary). That might be why mixed files seem relatively common in other languages.

program anyone
use,intrinsic::iso_fortran_env,only:int8,int16,int32,int64
! effective character variables
   integer(kind=int8) :: arr(256)
   integer(kind=int8) :: arrin(size(arr))
   integer        :: i
   integer,parameter :: istart=-127

   arr = 1
   call printit ! no newline character written as binary

   arr = [ (real(i-1), i=istart, istart+size(arr)-1) ]
   call printit ! one of each byte

contains

subroutine printit()
   icount=0
   arrin=-111  
   open(10,status='scratch')
   !open(10,status='scratch',action='readwrite',access='stream',form='formatted',pad='no')
   write (10, '(*(a))') arr
   rewind (10)
   !read (10, '(*(a))', advance='no', pad='no') arrin
   read (10, '(*(a))') arrin
   icount=count(arr/=arrin)
   write (*, '(*(g0,1x))') merge('GOOD:','BAD: ',icount == 0),'COUNT=', icount
end subroutine printit

Thanks for everyone’s input! :slight_smile:
For now I will use the file re-opening approach, but it would still be nice to be able to just ignore the newline byte in the future.