How to deal with symmetric matrixes?

I am working on a problem where I have a symmetric function afnc(x,x'), whose values I need to compute for a rectangular grid and store in a suitable matrix a(i,j).
Currently, I am using a normal square matrix. I evaluate half of the matrix and then I populate the other half using the property a(i,j)=a(j,i), as shown below:

program main
   implicit none
   integer, parameter ::nc = 1000
   real :: x(nc), a(nc, nc)
   integer :: i, j

   ! Make a grid
   do i = 1, nc
      x(i) = 1.*i
   end do

   ! Method 1
   do j = 1, nc
      do i = j, nc
         a(i, j) = afnc(x(i), x(j))
      end do
   end do
   do j = 2, nc
      do i = 1, j - 1
         a(i, j) = a(j, i)
      end do
   end do

contains

   pure real function afnc(x1, x2) result(res)
   !! "Very complex" kernel. Here made simple on purpose. 
      real, intent(in) :: x1, x2
      res = x1*x2
   end function

end program main

From a computational perspective, IMO, this seems already not-too-bad, because there are no redundant function calls and retrieving the values from the matrix is a “direct” operation for all index combinations. On the flip side, I am wasting half of the matrix (although we are only talking about a small ~(1000^2)/2 memory waste).

So, I was curious how I could improve this implementation without actually making things worse. For example, I could imagine storing the info in a vector v(k) (by flattening the “half-array”), but I would then have to wrap it into a function to compute the index k based on the array indexes (i,j), which would involve an if-else construct on the indexes, making it slower to access.

lapack probably has a nice solution to this (?), but as far as I know fpm cannot yet install it automatically, which somehow makes me a bit reluctant to go down that road …

Any thoughts are very welcome!

This is a fairly common situation, so there are a few decades of history to look at to see how to address this. I have written codes that date back to the 1970s that use the following approach.

The array elements are stored in a linear array of length (n*(n+1))/2. The individual elements are accessed as

a(i,j)a((i*(i-1))/2+j) for just the lower triangle elements, i>=j.

This scheme is equivalent to storing the lower triangle elements by rows, or the upper triangle elements by columns. For small arrays, you might consider saving the i computations in an index array

indx(i) = (i*(i-1))/2

in which case the array elements may be referenced as

a(i,j) a(indx(i)+j)

The indx(:) array can also be computed with just additions rather than multiplications. If you have many such symmetric matrices, they can all use the same indx(:) array.

The older eispack and linpack libraries supported this data storage scheme for symmetric matrices. However, there are no level-3 blas routines that support this, and there are no lapack routines based on this kind of packed storage.

This packed storage scheme also works for skew-symmetric matrices. Here one would typically not store the diagonal elements (which are all zero), and the lower and upper triangular elements are related through a sign change. It also applies to complex Hermitian and complex anti-Hermitian array storage.

In modern fortran, you can store such arrays as a jagged array using allocatable components of a derived type. This approach has some advantages and some disadvantages relative to the above packed storage scheme. Another posibility is to use the above packed scheme, and then assign fortran pointers to each lower-triangle row. The downside for this is that the original array must have the target attribute, which inhibits some low-level optimizations that can otherwise be performed.

2 Likes

I think Lapack allows to compute on the upper or lower triangle only, so you can just waste some memory and leave one of the triangles undefined.

3 Likes

Thanks @RonShepard. The solution you mention is what I was referring to at the bottom of my post:

For example, I could imagine storing the info in a vector v(k) (by flattening the “half-array”)

I’ve used that method here (the index mapping is a bit different, because that array is not 1,1-indexed). However, this “matrix-to-vector” method has IMO, two disadvantages:

  • The matrix can longer be called in a “natural” way, because it is not a matrix. For instance, I would not be able to do this.
  • If the calling code wants to access the other half of the matrix, we need to have an if-else to check the indexes and swap them if necessary. So, in practice one gets an additional if per array access.

Somehow, I get the feeling that such methods are only worth when memory management is the limiting factor.

@RonShepard

Thanks for this looking back to 1970’s !
It does trouble me that it is a “row” storage, rather than Fortran’s “column” storage approach, but I guess as it is a symmetric matrix rows and columns are the same!
The pre-calculation of “indx” is a useful tip.

Thank you for the good hints.
In case someone is faced with the same question, here is a brief summary of what I learned in the process (mostly obvious things):

Just for some reference, here is a symmetric matrix times vector, spmxv(), code using fortran array notation.

c(1) = a(1) * b(1)

mm0 = 1
do m = 2, n
   c(m) = dot_product( b(1:m), a(mm0+1:mm0+m) )
   if ( b(m) .ne. zero ) c(1:m-1) = c(1:m-1) + b(m) * a(mm0+1:mm0+m-1)
   mm0 = mm0 + m
enddo

In earlier times, that first line within the loop would have been sdot/ddot, and the second line would have been a saxpy/daxpy call. With fortran array syntax, it automatically adapts to the correct real kind, so no #if conditional compilation or source code conversion is required when porting between machines. There are n**2 memory fetches and 2*n**2 arithmetic operations, so it doesn’t profit to expand the matrix just to do a single spmxv().

The timings I have done in the past have always indicated that it is not worth the effort to do matrix-matrix products with packed matrices. It is better to expand one of the matrices and do a series of spmxv() steps (do concurrent), or to expand both matrices and do a matrix-matrix product (sgemm/dgemm or matmul()). The result is full square, with no index symmetry. There are 2*n**2 memory fetches and 2n**3 arithmetic operations, so expanding the matrices is hidden by all the fp operations, especially for large n.

1 Like