Hello. In classes/reading texts on linear algebra, I have always been told to use block algorithms. The rationale and everything makes sense to me, but I haven’t quite understood one part of implementation (ie, coding something myself): loading n blocks into memory to fill the cache, and then operating with those blocks.
As an example, here is a part of LAPACK’s dlaswp
, which permutes the rows of a matrix A
according to a permutation matrix stored as vector in IPIV
https://www.netlib.org/lapack/explore-html/d1/d7e/group__laswp_ga5d3ea3e3cb61e32750bf062a2446aa33.html:
n32 = ( n / 32 )*32
IF( n32.NE.0 ) THEN
DO 30 j = 1, n32, 32
ix = ix0
DO 20 i = i1, i2, inc
ip = ipiv( ix )
IF( ip.NE.i ) THEN
DO 10 k = j, j + 31
temp = a( i, k )
a( i, k ) = a( ip, k )
a( ip, k ) = temp
10 CONTINUE
END IF
ix = ix + incx
20 CONTINUE
30 CONTINUE
END IF
If I were to code this algorithm myself, I would just have do loop 30 iterate j
from 1
to n
, and get rid of do loop 10 (ie, one iteration of this look with k=j
), hoping that if I just iterate sequentially one-by-one, the compiler would load in the appropriate amount of columns of a row to fill the cache well (since there is no Fortran command that says “load these rows/these 32 elements of these rows into memory right now”). Is this the type of style block algorithms should be coded in Fortran?