Fortran Standard Library v0.6.0 Released

Dear Fortran Enthusiasts, we’re pleased to announce the release of Fortran Standard Library v0.6.0.
This release introduces major updates concerning the Linear Algebra APIs and the build system.
I would like to stress the invaluable help from @jeremie.vandenplas @hkvzjal and all the stdlib enthusiasts that made this release possible!

Linear Algebra

I’ve been developing high-level APIs that leverage the modernized, any-precision BLAS/LAPACK backends introduced in v0.5.0. Eager to receive community support and feedback on them!

Matrix determinant:

d = det(a [, overwrite_a, err])
d = .det.a

comes with:

  1. pure interface
  2. no-allocation interface (overwrite_a=.true.) with error control (err) and
  3. pure operator interface

Linear system solver:

x = solve(a,b) ! pure interface
x = solve(a,b,overwrite_a=.true.,err=err) ! expert interface
call solve_lu(a, b, x [, pivot, overwrite_a, err]) ! pure expert no-allocate interface

Least-squares solver

Least squares solver for 1-rhs (b(:)) or N-rhs (b(:,:)) cases

x = lstsq(a,b) ! simple interface
x = lstsq(a,b,cond,overwrite_a,rank,err) ! expert interface
call solve_lstsq(a,b,x, [, real_storage, int_storage, [cmpl_storage, ] cond, singvals, overwrite_a, rank, err])  ! no-allocation option

Optional arguments involve:
- cond = optional cutoff for rank evaluation: singular values s(i)<=cond*maxval(s) are considered 0. Default: 0.0
- overwrite_a = option to avoid internal allocation, but destroy input matrix a. Default: .false.
- rank = return rank of A, if requested
- err = return state variable
- real_storage, int_storage, cmpl_storage: provide pre-allocated storage arrays to avoid internal allocation. get minimum sizes using lstsq_space

Expert error control

All procedures are pure where possible, and if something goes wrong, the code will raise an error stop with the error code and a full description of what happened.

All of them also accept an optional argument:

`type(linalg_state_type), intent(out), optional :: err`

that contains state information. In this case, the code will never stop, so the developer has full control on the algorithm. Typical errors are:

  • LINALG_ERROR if something in the algorithm failed, i.e., it did not converge, or the matrix is singular, etc.
  • LINALG_VALUE_ERROR on input issues (wrong matrix sizes, …).

Build system

@hkvzjal developed a new pre-processing system that allows to preprocess all fypp files and generate pure-Fortran sources with full customization of the fypp and cpp macros.

Using this new approach, stdlib can now be directly built using fpm from the root folder, for instance:

python config/fypp_deployment.py
fpm build --profile release

The stdlib-fpm branch can be created with the option (new approach for the fpm github ci)

python config/fypp_deployment.py --deploy_stdlib_fpm

Current fypp options supported by the script are:

--njob NJOB          Number of parallel jobs for preprocessing
--maxrank MAXRANK    Set the maximum allowed rank for arrays
--with_qp            Include WITH_QP in the command
--with_xdp           Include WITH_XDP in the command
--lnumbering         Add line numbering in preprocessed files

For more details refer to the README section Build with fortran-lang/fpm

Up Next

More linear-algebra functions will follow (svd, qr, cholesky, matrix inverse, …), along with the option to automatically detect and link against high-performance BLAS and LAPACK libraries on the local system. If you like NumPy, you will love the Fortran Standard Library!

Full changelog

34 Likes

Excellent work @FedericoPerini. Thank you (and others involved) for your tireless efforts in improving the community tools and libraries. I also have a few suggestions for improving the APIs:

  1. Interface purity: One idea that may be interesting to explore is to keep all (or as many) function interfaces (as possible) pure and reserve all lower-level faster impure function interfaces for (potentially pure) subroutines. An example that follows this interface design is getMatDet() generic function interface and setMatDet() generic subroutine interface we developed some years ago in our library. So, the “no-allocation interface (overwrite_a=.true.) with error control (err)” interface in det design of stdlib could be a pure subroutine interface. We have used this style of keeping function interfaces for high-level flexibility and subroutine interfaces for faster but potentially less flexible interfaces while keeping both interfaces pure in all our functionality implementations in the ParaMonte library, and it has served the purpose well. Is there a reason for preferring impure function interfaces instead of pure (potentially faster) subroutine interfaces?

  2. Linear system solvers are, to my understanding, merely glorified matrix multiplication tasks. A large number of BLAS routines (e.g., STRMV, DTRMV, CTRMV, and ZTRMV, STRSV, DTRSV, CTRSV, and ZTRSV, STRMM, DTRMM, CTRMM, and ZTRMM, STRSM, DTRSM, CTRSM, and ZTRSM) merely multiply matrices of different classes and packing under various transformations (e.g., transposition or inversion). Linear system solvers could be considered a special matrix multiplication involving an inversion operation on either the first or the second matrix (representing the coefficient matrix). In re-implementing some of the BLAS routines for generic kind/type calculations, we learned that many of the seemingly disparate BLAS routines could be merged under a single generic interface. Given that the current design of BLAS routines was enforced by the extreme programming limitations of the 1980s’, it may make sense to gradually think of transitioning and merging the seemingly disparate BLAS routines under generic interfaces for well-defined tasks. Another such example besides triangular matrix multiplication is the collection of SAXPY, DAXPY, CAXPY, ZAXPY SGEMV, DGEMV, CGEMV, ZGEMV SSPMV, DSPMV, CHPMV, ZHPMV, SSYMV, DSYMV, CHEMV, ZHEMV, SGEMM, DGEMM, CGEMM, ZGEMM, SSYMM, DSYMM, CSYMM, ZSYMM, CHEMM, ZHEMM (and many more) routines in BLAS whose functionality can be summarized as matrix multiplication and addition with different packing formats and matrix subsetting rules, under various matrix transformations, which we tried to unify them all under a single kind-, type-, rank-, and task- generic interface. These are some random explorations and learnings we had in the past while developing the necessary generic infrastructure for other ML algorithms built on a subset of BLAS routines.

If any ideas seem interesting and suitable for the stdlib, I’d be happy to share what we have learned with you and other stdlib contributors to further this noteworthy collective community effort.

2 Likes

Thanks a lot @shahmoradi. I’m sorry I didn’t know Paramonte well enough to know it had such detailed (congrats, awesome!) linear algebra capabilities that would have been extremely valuable designing this! Please do feel absolutely free to participate in the API discussion (e.g. currently working on svd).

Indeed there are several corners that are definitely still subject to improvement, especially on the computational side (see for example the solve interface, it’s exactly like you suggest). So far, I would say the design tries to follow both your ideas, although the attempt has been to also:

  • keep the APIs as close as possible to well-estabilshed packages - they often just return function values, because they don’t care much about computational performance.
  • allow the user to link against external BLAS/LAPACK libraries. This is the only way for us to ensure the APIs can be used with hardware-tuned implementation. That also means we can’t change these libraries’ interfaces, although we can, for example, strongly type them, i.e.:

Now, unfortunately, several functions in LAPACK have intent(out) arguments which means they can’t be made pure unless they’re refactored to become subroutines. This prevents some of the high-level functions (lstsq) from being pure at all, so, more work will be needed on the modernized LAPACK to make sure these cases can be improved.

Absolutely. So far, we’ve only generalized interfaces across real/complex types and kinds, but haven’t considered matrix patterns yet, i.e. again the GESV backend contains 2 internal implementations (quad-precision) and 4 switchable external/internal implementations (whether an external BLAS/LAPACK is being used or not), see here. The reason is again we want the user to be able to access external, fast libraries if they can. In this area, your experience would also be very valuable so make your voice be heard!

1 Like

Sounds great. I look forward to discussing future API designs with you and others!
One reason scripting languages mix impure interfaces and pure function interfaces (for example, via the overwrite_a argument in Scipy or numpy) is that they have no equivalent to subroutine. However, Fortran libraries could take advantage of this extra language facility, allowing most procedures to remain pure.

One way to allowing the user to link against external BLAS/LAPACK libraries could be to expose both the original kind-agnostic generic interfaces to BLAS routines in a dedicated module and higher-level more generic-matrix interfaces in their dedicated modules like the ones I mentioned above. The higher level interfaces would automatically dispatch to the relevant BLAS implementation when available. Given our limited resources, however, we never went beyond reimplementing the BLAS/LAPACK functionalities we needed for our main tasks. A community effort may be able to tackle such a feat. Looking forward to more discussions. If I can contribute anywhere to future developments, please feel free to ping my GitHub ID @shahmoradi.

4 Likes

I agree. Even if function purity is not always possible at the moment because of impure low level functions, one can expect fixing these low level functions in some future. So it’s important to design potentially pure function interfaces. Things like overwrite_a or error codes should be reserved for the subroutine versions IMO

2 Likes

This looks great! Amazing work!

Are there plans to add a matrix product, like matmul or dgemm? I find that matmul works well for small matrices, but it can segfault on large matrices. Of course dgemm works, but its interface isn’t very friendly.

1 Like

Thanks Jeff - that’s a great idea. rn the focus is still to add missing functionality i.e. the matrix decompositions, solvers, eigenvalues etc. but a great way to push this forward would be to open an issue on the stdlib repo. So, we can iterate on the syntax meanwhile (exact replica of matmul? optional inputs? etc).

1 Like

I think any function that performs matrix multiplication will have the same issue, as the compilers often have to allocate temporary arrays, and some of them use the stack by default. So an alternative to MATMULT has to be a subroutine.

2 Likes

Even if function purity is not always possible at the moment because of impure low level functions, one can expect fixing these low level functions in some future.

If this refers to LAPACK: Since LAPACK does not provide interfaces, one has to write them manually anyways which allows to add the pure attribute. But that of course requires to check whether the routines of all providers (openBLAS, MKL, …) are indeed pure, at least when invoked normally.

Sorry, I spoke too soon. It turns out that matmul is just fine, it’s just that I forgot to set ulimit -s unlimited as suggested on the Intel forum.

I was working up a minimal example before opening an issue and I noticed that gfortran can handle matrices with millions of entries. Only ifx requires fiddling with ulimit. So there’s probably no need to have a stdlib alternative to matmul, at least not for matrices that can fit in my laptop’s RAM. Also I agree with @PierU that the advantage of LAPACK’s dgemm is that the caller allocates everything, so dgemm does not need any temp arrays.