I now have some working code that does both parallel (coarray) and asynchronous read/writes to a single file, of different records. It’s not too different from the above code, but I’ll include it at the bottom for completeness.
I obtain the expected output, which is a good sign. As Ron points out however, this is possibly luck. I ran it 20 times and did not see any errors, but a stress test (large files of small records) may yield the behavior Ron is alluding to.
program main
implicit none
integer, parameter :: blocks_per_image = 2**16
integer, parameter :: block_size = 2**10
real, dimension(block_size) :: x, y
integer :: in_circle[*], unit[*] ! an integer but each image has a different local copy
integer :: i, n_circle, n_total, rec_len, io_id
real :: step, xfrom
n_total = blocks_per_image * block_size * num_images()
step = 1./real(num_images())
xfrom = (this_image() - 1) * step
inquire(iolength=rec_len) in_circle, n_total
open(newunit=unit,file='output.txt',form='UNFORMATTED',access='DIRECT',recl=rec_len, asynchronous='yes')
in_circle = 0
do i=1, blocks_per_image
call random_number(x)
call random_number(y)
in_circle = in_circle + count((xfrom + step * x)** 2 + y**2 < 1.)
end do
write(unit,rec=this_image(), asynchronous='yes') in_circle, n_total
sync all
close(unit) ! async operations finish before it closes
! Reset in_circle, n_total to make sure we read values
in_circle = 10
n_total = 10
open(newunit=unit,file='output.txt',form='UNFORMATTED',access='DIRECT', action='READ', recl=rec_len, status='OLD', asynchronous='yes')
read(unit,rec=this_image(), asynchronous='yes', id=io_id) in_circle, n_total
! can in principle do computations here, so long as they don't need in_circle, n_total
wait(unit=unit, id=io_id) ! need to wait before printing this, to let asynchronous read complete. unit specifies fileunit, id specifies which particular IO operation.
write(*,*), this_image(), " reads in_circle and n_total: ", in_circle, n_total
sync all
close(unit)
end program main