@PierU
At the end I can’t really see what may be added in stdlib here (?).
To demonstrate the effect of our toy ‘RAM disk’ I ran the same quantum time-evolution calculation on 16 nodes (4096 cores) of a 768 node cluster with a 20 PB parallel file system, shared by many other users.
With our ‘RAM disk’ enabled and no writing of direct access files, the job took 21 minutes. The CPU utilization was over 90%
Without using the ‘RAM disk’ the time was 87 minutes with a CPU utilization of about 40%.
As you can see, the shared file system is a severe bottle-neck for our type of calculation, with CPUs having to wait for data to be written and read.
This feature of a temporary, volatile file system could be made more general-purpose and part of stdlib.
For example, there could be routines like
open_stdlib(50,file='data',form='UNFORMATTED',access='DIRECT',recl=recl)
write_stdlib(50,rec=rec) x,y,z
which, by default, would be just wrappers for the usual open and write statements.
However, you could specify additional options like
open_stdlib(50,file='data',form='UNFORMATTED',access='DIRECT',recl=recl, &
volatile=.true.)
which would then store the data to RAM only. If required, the same data could be written once to the non-volatile file system at the end of the calculation (this is usually what we do).
On a cluster, one could pass the MPI communicator
open_stdlib(50,file='data',form='UNFORMATTED',access='DIRECT',recl=recl, &
volatile=.true.,MPI_Comm=mpi_comm_world)
which would allow data to be read by all nodes. This would require some nifty coding using asynchronous MPI communication.
From our HPC point of view, this would be a great feature.