Internal I/O with stream access?

I am trying to read a binary file (XDR) using unformatted stream mode.

Part of the data is compressed with zlib, and must be “inflated” first. Is there any way to continue reading the inflated character buffer in stream mode?

interface
   ! Use zlib to decompress buffer
   subroutine decompress_buffer(inlen,inbuf,outlen,outbuf,ierr) bind(c)
      use, intrinsic :: iso_c_binding, only: c_int, c_char
      integer(c_int), intent(in) :: inlen, outlen
      character(kind=c_char), intent(in) :: inbuf(inlen)
      character(kind=c_char), intent(out) :: outbuf(outlen)
      integer(c_int), intent(out) :: ierr
   end subroutine
end interface

integer :: unit, ierr
integer(c_int) :: sz_comp, sz_orig
character(len=:,kind=c_char), allocatable :: comp, orig

open(newunit=unit,file=fname,status="old", action="read",&
   form="unformatted", access="stream", convert="big_endian")

! Sizes of compressed, and uncompressed data
read(unit) sz_comp, sz_orig

! Allocate buffers
allocate(comp(sz_comp), orig(sz_orig))

! Read compressed data
read(unit) comp

! Decompress
call decompress_block(sz_comp,comp,sz_orig,orig,ierr)
if (ierr == 0) then
   ! Continue reading uncompressed data in stream mode
   ! ???
   read(orig) ...
else
   error stop "Error while decompressing."
end if

I suppose I could write the decompressed data to an unformatted file first and then read it from there. The other option is to parse the original data buffer byte by byte using the transfer intrinsic, but I’m not too keen to do this because the original data is big-endian encoded.

Doing all of the I/O in C would also work, but I was hoping I could do the majority of it directly in Fortran.

You could, I suppose, simply access the individual characters in the buffer. Instead of using stream-access you would copy the bytes into the variables directly. Of course, that requires a different implementation than reading from a string, but unformatting stream access is very similar, as there is no decoding required of end-of-lines or the like.

Then, of course, the actrual implementation is not as easy writing down the general idea :slight_smile:

ChatGPT has suggested wrapping a couple of C functions including fmemopen (on POSIX) and fread:

// Function to open a memory buffer as a file and read it
FILE *open_memory_buffer(const char *buffer, size_t size) {
    return fmemopen((void *)buffer, size, "rb");
}

// Function to read from the memory buffer
void read_from_memory_buffer(FILE *stream, void *outbuf, size_t size) {
    fread(outbuf, 1, size, stream);
}

// Function to close the memory buffer
void close_memory_buffer(FILE *stream) {
    fclose(stream);
}

In Fortran these would become:

function open_memory_buffer(buffer,size) bind(c)
    integer(c_size_t), value :: size
    character(kind=c_char) :: buffer(size)
    type(c_ptr) :: open_memory_buffer
end function

subroutine read_from_memory_buffer(stream,outbuf,size) bind(c)
   type(c_ptr), value :: stream
   type(*) :: outbuf
   integer(c_size_t), value :: size
end subroutine

subroutine close_memory_buffer(stream) bind(c)
   type(c_ptr), value :: stream
end subroutine

I thought that maybe I can just copy the same stream idea using a Fortran deferred-length string pointer as the stream. Here is what I’ve come up with (not thoroughly tested yet!):

// sread.c
#include <string.h>
#include <ISO_Fortran_binding.h>

// Read from a string stream
//
// subroutine sread(buffer,size,count,stream,ierr) bind(c)
//    type(*), intent(out) :: buffer
//    integer(c_int), value :: size, count
//    character(kind=c_char), pointer :: stream(:)
//    integer(c_int), intent(out) :: ierr
//
void sread(
        void *restrict buffer,
        int size,
        int count,
        CFI_cdesc_t *restrict stream,
        int *ierr) 
{
    *ierr = 0;
 
    const size_t nbytes = size * count;
    const size_t remaining = stream->elem_len - nbytes;
    if (remaining < 0) {
        *ierr = -1;
        return;
    }

    memcpy(buffer, stream->base_addr, nbytes);

    CFI_CDESC_T(0) tmp;
    int istat;

    // tmp => stream(nbytes+1:)
    istat = CFI_establish(
        (CFI_cdesc_t *) &tmp,
        stream->base_addr + nbytes,
        CFI_attribute_pointer,
        stream->type,
        remaining,
        stream->rank,
        NULL);
    if (istat != CFI_SUCCESS) {
        *ierr = -2;
        return;
    }

    // stream => tmp
    istat = CFI_setpointer(stream, (CFI_cdesc_t *) &tmp, NULL);
    if (istat != CFI_SUCCESS) {
        *ierr = -3;
        return;
    }
}

I’m not sure if the pointer association this way is completely legal. It’s also not clear what is supposed to happen when the stream is consumed; I’m guessing it should be a zero-length string (and not point one element over the last element as the convention for C++ iterators). The following works in GCC however,

! test_stream.f90
program test_stream
use, intrinsic :: iso_c_binding
implicit none
interface
    subroutine sread(buffer,size,count,stream,ierr) bind(c)
        import c_int, c_char
        type(*) :: buffer
        integer(c_int), value :: size, count
        character(len=:,kind=c_char), pointer, intent(inout) :: stream
        integer(c_int), intent(out) :: ierr
    end subroutine
end interface

character(len=12,kind=c_char), target :: data
character(len=:,kind=c_char), pointer :: stream

integer :: ierr
character(len=5) :: p1, p2
integer(c_int16_t) :: pad

data = "Hello, World"
stream => data

! Read 
call sread(p1,5,1,stream,ierr)
call sread(pad,2,1,stream,ierr)
call sread(p2,5,1,stream,ierr)

print *, p1
print *, pad, transfer(', ',pad)
print *, p2
end program

Results of GCC:

$ gcc-14 -Wall -c sread.c 
$ gfortran -Wall test_stream.f90 sread.o
$ ./a.out
 Hello
   8236   8236
 World

Results of Intel compilers:

$ icc -std=c11 -diag-disable=10441  -Wall -c sread.c 
$ ifort test_stream.f90 sread.o
$ ./a.out
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
a.out              0000000104AB0564  for__signal_handl     Unknown  Unknown
libsystem_platfor  00007FF814993DFD  _sigtramp             Unknown  Unknown
a.out              0000000104B118C1  __intel_avx_rep_m     Unknown  Unknown
a.out              0000000104A88FA1  sread                 Unknown  Unknown
a.out              0000000104A88D99  MAIN__                Unknown  Unknown
a.out              0000000104A88C2E  main                  Unknown  Unknown

The runtime failure occurs at the point I call CFI_setpointer. The Intel Fortran documentation on CFI_setpointer implies that it isn’t legal to have a data pointer object as the source. But according to the F2023 interpretation document (section 18.5.5.9, paragraph 2) it is legal:

  • source, shall be a null pointer or the address of a C descriptor for an allocated allocatable object, a data pointer object, or a nonallocatable nonpointer data object that is not an assumed-size array. (emphasis added)

I’m guessing this is an amendment made in the F2023 standard.

From CFI_setpointer:

source: A null pointer or the address of a C descriptor for a nonallocatable nonpointer data object, an allocated allocatable object, or a data pointer object. If source is not a null pointer, the corresponding values of the elem_len, rank, and type members must be the same in the C descriptors with the addresses source and result. (emphasis added)

In the C function, tmp and stream will have different lengths.
I did a quick test changing to stream->elem_len when establishing tmp, and it worked with Intel compilers (2021.10.0 20230609).

1 Like

Hmm, is that allowed? The standard states (J3/24-007, section 18.6, paragraph 1):

A C descriptor shall not be initialized, updated, or copied other than by calling the functions specified in 18.5.5.

If I understand correctly you did:

    // stream => tmp
    stream->elem_len = remaining; // NOT ALLOWED!?
    istat = CFI_setpointer(stream, (CFI_cdesc_t *) &tmp, NULL);

I tried to nullify the pointer first, using:

    // nullify(stream)
    istat = CFI_setpointer(stream, NULL, NULL);
    // stream => tmp
    istat = CFI_setpointer(stream, (CFI_cdesc_t *) &tmp, NULL);

but it didn’t appear to help. I was using ifort 2021.9.0.

No, I did (section 18.5.5.9, paragraph 2-3):

    istat = CFI_establish(
        (CFI_cdesc_t *) &tmp,
        stream->base_addr + nbytes,
        CFI_attribute_pointer,
        stream->type,
        stream->elem_len,  // <--- HERE was  "remaining"  originally
        stream->rank,
        NULL);

Since you do not “resize” your stream length (and I might see why), the only way for it to adhere is to set tmp descriptor length to match stream’s one.

NOTE: I did that just as a quick way to test if that was effectively the issue. Indeed, starting from an offset address, but keeping the same length is not correct. For that, a better solution should be thought.

Doesn’t that push the string contents out-of-bounds? Because the base address has been moved up, the deferred-length string pointer is supposed to become shorter (elem_len is what len(stream) should return in Fortran).

Edit: I should probably check what istat says. I wrote some helper functions a while ago, but didn’t have time to include them here yet:

static const char *cfi_errstrs[12] = {
    "No error detected.\n",
    "The base address member of a C descriptor is a null pointer in a context that requires a non-null pointer value.\n",
    "The base address member of a C descriptor is not a null pointer in a context that requires a null pointer value.\n",
    "The value supplied for the element length member of a C descriptor is not valid.\n",
    "The value supplied for the rank member of a C descriptor is not valid.\n",
    "The value supplied for the type member of a C descriptor is not valid.\n",
    "The value supplied for the attribute member of a C descriptor is not valid.\n",
    "The value supplied for the extent member of a CFI_dim_t structure is not valid.\n",
    "A C descriptor is invalid in some way.\n",
    "Memory allocation failed.\n",
    "A reference is out of bounds.\n",
    "Unrecognized status code.\n"
};

// Returns the description string for an error code.
//
const char* cfiGetErrorString(int stat) {

    switch (stat) {
        case CFI_SUCCESS:                  return cfi_errstrs[0]  ; break; 
        case CFI_ERROR_BASE_ADDR_NULL:     return cfi_errstrs[1]  ; break;
        case CFI_ERROR_BASE_ADDR_NOT_NULL: return cfi_errstrs[2]  ; break;
        case CFI_INVALID_ELEM_LEN:         return cfi_errstrs[3]  ; break;
        case CFI_INVALID_RANK:             return cfi_errstrs[4]  ; break;
        case CFI_INVALID_TYPE:             return cfi_errstrs[5]  ; break;
        case CFI_INVALID_ATTRIBUTE:        return cfi_errstrs[6]  ; break;
        case CFI_INVALID_EXTENT:           return cfi_errstrs[7]  ; break;
        case CFI_INVALID_DESCRIPTOR:       return cfi_errstrs[8]  ; break;
        case CFI_ERROR_MEM_ALLOCATION:     return cfi_errstrs[9]  ; break;
        case CFI_ERROR_OUT_OF_BOUNDS:      return cfi_errstrs[10] ; break;
    }

    return cfi_errstrs[11];
}

#define CHECK_CFI(func)                                                        \
{                                                                              \
    int stat = (func);                                                         \
    if (stat != CFI_SUCCESS) {                                                 \
        fprintf(stderr,"%s:%d: CFI API failed with error: (%d) %s",            \
            __FILE__, __LINE__, stat, cfiGetErrorString(stat));                \
    }                                                                          \
}
1 Like

We wrote the same thing concurrently. It surely is unsafe, and to be avoided.
I think that for it to be safe, you’d need to shrink the stream buffer descriptor length accordingly, and keep the original way in which you were establishing the tmp descriptor. But in this case, you’d loose control over your stream, let’s say in case you wanted to rewind it.

EDIT: though you might think of this approach as a “single-usage” scenario. You open the stream, you process it, and once done, you destroy it.

1 Like

Yes. The “rewind” in my mind consists of just re-targeting the pointer to the beginning:

character(len=:), allocatable, target :: tape
character(len=:), pointer :: stream

tape = load_tape()

stream => tape
do while (len(stream) > 0)
    call stream_read(..., stream, ierr)
    if (ierr < 0) exit
end do
! stream has reached end of tape

! "rewind"
stream => tape
1 Like

Then, you could directly work on the stream descriptor:

void sread(
        void *restrict buffer,
        int size,
        int count,
        CFI_cdesc_t *restrict stream,
        int *ierr) 
{
    *ierr = 0;
 
    const size_t nbytes = size * count;
    const size_t remaining = stream->elem_len - nbytes;
    if (remaining < 0) {
        *ierr = -1;
        return;
    }

    memcpy(buffer, stream->base_addr, nbytes);

    // stream => stream(nbytes+1:)
    int istat = CFI_establish(
        stream,
        stream->base_addr + nbytes,
        CFI_attribute_pointer,
        stream->type,
        remaining,
        stream->rank,
        NULL);
    if (istat != CFI_SUCCESS) {
        *ierr = -2;
        return;
    }
}

Thanks, that looks better. :+1:

It works with GCC, but with ifort it still fails on the third invocation when the string stream is empty:

remaining = 0
sread.c:144: CFI API failed with error: (3) The value supplied for the element length member of a C descriptor is not valid.
 ierr =           -2

I’ll try with a newer version of ifx once I’m at my other computer.

Edit: hmm, it may be like this because of the standard’s requirements for CFI_establish (J3/24-007, section 18.5.5.5, lines 18-20):

elem_len If type is equal to CFI_type_struct, CFI_type_other, or a Fortran character type code, elem_len shall be greater than zero and equal to the storage size in bytes of an element of the object. Otherwise, elem_len will be ignored. (emphasis added)

I think this is weird because the Fortran equivalent is legal:

! cfi_test.f90
interface
    function cfi_len(stream) bind(c)
        use, intrinsic :: iso_c_binding, only: c_char, c_int
        character(len=:,kind=c_char), pointer :: stream
        integer(c_int) :: cfi_len
    end function
end interface
character(len=:), pointer :: stream
allocate(character(len=0) :: stream)
print *, associated(stream), len(stream), cfi_len(stream)
end
// cfi_len.c
#include <ISO_Fortran_binding.h>
int cfi_len(CFI_cdesc_t *stream) {
    return stream->elem_len;
}

Second edit: in section 18.5.3, the standard appears to imply that an object can have zero-size

void * base_addr; If the object is an unallocated allocatable variable or a pointer that is disassociated, the value is a null pointer; otherwise, if the object has zero size, the value is not a null pointer but is otherwise processor-dependent.

So the way things are written know, you can receive a zero-size object as a dummy argument. But you cannot establish one in C. :confounded:

Mh, weird indeed.

Well, it throws error, but the memcpy happens before, so that you might have the last portion of the stream in the buffer, so that the stream read itself has been correctly done.

A workaround could be to disassociate the stream pointer if you are trying to set to a 0-length (in C):

    // stream => stream(nbytes+1:)
    char *newadrr = NULL;
    size_t newlen = 1;
    if (remaining > 0) {
        newadrr = stream->base_addr + nbytes;
        newlen  = remaining;
    }
    int istat = CFI_establish(
        stream,
        (void *)newaddr,
        CFI_attribute_pointer,
        stream->type,
        newlen,
        stream->rank,
        NULL);
    if (istat != CFI_SUCCESS) {
        *ierr = -2;
        return;
    }

and in the Fortran side, instead of checking the length, you check its association:

do while (associated(stream))

The CFI_setpointer version ought to work in F2023. There appears to be a sentence carved out in the standard precisely for this edge case (J3/24-007, section 18.5.5.9, lines 22 - 24):

If source is not a null pointer and the C descriptor with the address result does not describe a deferred length character pointer, the corresponding values of the elem_len member shall be the same in the C descriptors with the addresses source and result.

Inverting the logic, if the C descriptors are deferred-length character pointers, the corresponding values of elem_len are not required to be same - they can be different.

Currently (ifort 2021.9.0), the CFI_setpointer version fails with the same error as CFI_establish does:

$ ./a.out
sread.c:107: CFI API failed with error: (3) The value supplied for the element length member of a C descriptor is not valid.
 ierr =           -3

Thanks for troubleshooting this with me.

From my interpretation, that is because if result is a deferred length, elem_len is inherited (i.e. copied) from source? That’s why they might not be the same. Otherwise, they shall match.

EDIT: in fact, J3/24-007, section 18.5.5.9, paragraph 3 states:

Description. Successful execution of CFI_setpointer updates the base_addr, dim, and possibly elem_len members of the C descriptor with the address result as follows: (emphasis 2 added)

That “possibly” could potentially refer to the case of result being deferred length.

That looks promising. Would be valuable to write some tests to verify the expected F2023 semantics.


For the time being, I think I’ve found a workaround.

The stream read is done in C, because otherwise we cannot use the genericity of type(*). However, the pointer retargetting is done in Fortran:

// sread.c
#include <stdio.h>
#include <string.h>
#include <ISO_Fortran_binding.h>

extern void cfi_setcharpointer(CFI_cdesc_t *result, char *src, size_t len);

void sread(
        void *restrict buffer,
        int size,
        int count,
        CFI_cdesc_t *restrict stream,
        int *ierr) 
{
    *ierr = 0;
 
    const size_t nbytes = size * count;
    const size_t remaining = stream->elem_len - nbytes;
    if (remaining < 0) {
        // Caller requested more memory than is available
        // Exit early to prevent out-of-bounds violation
        *ierr = -1;
        return;
    }

    memcpy(buffer, stream->base_addr, nbytes);
    cfi_setcharpointer(stream, stream->base_addr + nbytes, remaining);
}
! test_stream.f90
subroutine cfi_setcharpointer(result,src,len) bind(c)
    ! Workaround for <F2023 behavior of CFI_setpointer
    use, intrinsic :: iso_c_binding, only: c_char, c_size_t, c_ptr, c_f_pointer
    character(len=:,kind=c_char), pointer :: result
    type(c_ptr), value :: src
    integer(c_size_t), value :: len
    block
        character(len=len,kind=c_char), pointer :: tmp
        call c_f_pointer(src, tmp)
        result => tmp
    end block
end subroutine
program test_stream
use, intrinsic :: iso_c_binding
implicit none
interface
    subroutine sread(buffer,size,count,stream,ierr) bind(c)
        import c_int, c_char
        type(*), intent(inout) :: buffer
        integer(c_int), value :: size, count
        character(len=:,kind=c_char), pointer, intent(inout) :: stream
        integer(c_int), intent(out) :: ierr
    end subroutine
end interface

character(len=12,kind=c_char), target :: data
character(len=:,kind=c_char), pointer :: stream

integer :: ierr
character(len=5) :: p1, p2
integer(c_int16_t) :: pad

data = "Hello, World"
stream => data

call sread(p1,5,1,stream,ierr)
call sread(pad,2,1,stream,ierr)
call sread(p2,5,1,stream,ierr)

print *, p1 // transfer(pad,', ') // p2
print *, associated(stream), len(stream)

end program

Result:

$ icc -std=c11 -Wall -diag-disable=10441 -c sread.c
$ ifort -warn all test_stream.f90 sread.o
$ ./a.out
 Hello, World
 T           0
$ gcc-14 -std=c11 -Wall -c sread.c
$ gfortran-14 -Wall -Wno-uninitialized test_stream.f90 sread.o
$ ./a.out
 Hello, World
 T           0
1 Like

I’ve proposed in the past that Fortran I/O be extended to provide native support for both base64 and xdr encoding. This looks like a case were native support at the read/write level would help a lot. I want base64 in order to simplify VTK xml format input and output, without having to resort to stream access and writing everything out as binary. If some compilers allow you to specify on the open statement that you want conversion to/from big endian/little endian I would think something similar combined with some kind of encoding option on a read/write statement would be doable.

2 Likes