Achieving OpenBLAS DGEMM performance with Fortran vs C intrinsics: why is Fortran slower?

ivanpribec · August 22, 2025, 2:03pm

Perhaps using !$omp simd could provide some extra control? (It might just by a rabbit-hole which doesn’t end.) It depends if you count that as pure Fortran anymore; at least Intel Fortran and gfortran have the -qopenmp-simd/-fopenmp-simd flags, which don’t need linking with the OpenMP runtime. Maybe also the new loop transformation constructs !$omp tile and !$omp unroll could help, although YMMV due to implementation differences among compilers, not to mention interaction with the optimization passes.

A similar challenge was discussed in the thread: C++ Standard Library dense linear algebra interface - #22 by tyranids (see posts from @tyranids)

Topic		Replies	Views
C++ Standard Library dense linear algebra interface	23	2562	August 23, 2025
Testing the performance of `matmul` under default compiler settings Help	37	2848	August 11, 2022
Writing wrappers for LAPACK and BLAS routines	54	2100	December 14, 2023
Mapping matrix & vector arithmetic to BLAS calls	8	1397	July 20, 2022
Matmul benchmark	10	656	August 9, 2023

Achieving OpenBLAS DGEMM performance with Fortran vs C intrinsics: why is Fortran slower?

Related topics