Perhaps using !$omp simd
could provide some extra control? (It might just by a rabbit-hole which doesn’t end.) It depends if you count that as pure Fortran anymore; at least Intel Fortran and gfortran have the -qopenmp-simd
/-fopenmp-simd
flags, which don’t need linking with the OpenMP runtime. Maybe also the new loop transformation constructs !$omp tile
and !$omp unroll
could help, although YMMV due to implementation differences among compilers, not to mention interaction with the optimization passes.
A similar challenge was discussed in the thread: C++ Standard Library dense linear algebra interface - #22 by tyranids (see posts from @tyranids)