How many BLAS libraries have this error?

There is a fairly well-known error in many BLAS libraries. Here is some code that demonstrates it.

program test_blas
   implicit none
   real :: dot, snorm, x(3) = [1.0, 2.0, 3.0], y(3) = [3.0, 0.0, 4.0]
   character(*), parameter :: cfmt='(a,i0,2(a,f0.3))'
   block
     integer, parameter :: wp=kind(1.e0)
     real(wp), external :: sdot, snrm2
     dot = sdot(3,x,1,y,1)
     snorm = snrm2(3,y,1)
     write(*,cfmt) 'wp kind=', wp, ' sdot=', dot, ' snrm2=', snorm
   end block
   block
     integer, parameter :: wp=kind(1.d0)
     real(wp), external :: sdot, snrm2
     dot = sdot(3,x,1,y,1)
     snorm = snrm2(3,y,1)
     write(*,cfmt) 'wp kind=', wp, ' sdot=', dot, ' snrm2=', snorm
   end block
end program test_blas

Here is what you get with the Apple system library, which displays the error:

$ gfortran -framework accelerate test_sblas.f90 && a.out
wp kind=4 sdot=.000 snrm2=.000
wp kind=8 sdot=15.000 snrm2=5.000

Here is the same machine with the Intel MKL library, which does NOT display the error:

$ ifort test_sblas.f90  -L/opt/intel/oneapi/mkl/2022.2.0/lib -Wl,-rpath,/opt/intel/oneapi/mkl/2022.2.0/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread  && a.out
wp kind=4 sdot=15.000 snrm2=5.000
wp kind=8 sdot=.000 snrm2=5.000

The reason for this, I’ve read in many places, is that in the early 1980s the fortran source code for the reference BLAS was converted to C with the f2c utility. C converts all floating point arguments to double precision by default, and f2c apparently had a bug (or feature) that single precision functions were converted accordingly to double precision. Programmers using these converted libraries adjusted their declarations in order to get the right results instead of correcting the library. Thus now, correcting the library code would cause many programs to fail.

So I’m curious now if this can be fixed after some 4 decades? How many BLAS libraries have this error? Is there anything that we fortran programmers can do to help fix this longstanding error. Has this C error now propagated into other C-like languages, such as C++, python, Java, R, and so on?

2 Likes

The Apple accelerate framework was known to be broken for a some time. Numpy even refused to use it as a BLAS implementation due to various outstanding bugs, there are a few tickets on the Numpy bug tracker which might give some insights (see ENH: Check for buggy Accelerate on MacOS · Issue #15647 · numpy/numpy · GitHub for a start).

I think it was fixed when they ported the accelerate framework for the new arm64 architecture, not sure if they mended the one that is shipped for x86_64, though.

1 Like

Have you noticed that the type declarations for the BLAS functions sdot and snrm2 in your second block are wrong, since those functions in the BLAS are of type real? Because of that error, the program behavior is “undefined”, is it not?

I notice that the second line of the output that you reported for Intel Fortran/MKL gave sdot = 0.0, which is not what you probably expected from the program. Before naming a particular BLAS implementation as being responsible for the problem, please change the declaration in the second block to
real, external :: sdot snrm2
and examine the results of making this change.

Recently, I tried to refactor a trivial BLAS/LAPACK support routine (Simplify lsame · Issue #701 · Reference-LAPACK/lapack · GitHub). The response I got was lukewarm at best. Related to that I opened the Discourse thread: Resistance to modernization - #6 by ivanpribec. I imagine that a refactoring which would change calculation results would generate even more resistance than the trivial routine I attempted.

Have you taken a look at BLIS perhaps? The routines bli_?normfv calculates the Frobenius norm. Presumably the authors of BLIS have fixed any BLAS mistakes, but I haven’t checked myself.

An earlier related thread was - Why abandon Fortran for Linear Algebra?. The current maintainers of the Reference LAPACK repository (also includes the Reference BLAS) seem to be working actively on <T>LAPACK - C++ Template Linear Algebra PACKage.

I agree with @mecej4 in that your test program is clearly wrong.

In the test examples, all usage of “x” and “y” are default real, so I am puzzled what your example shows when you are claiming that sdot or snrm2 might be real4 or real8, while the function arguments do not agree with your suggested similar kind error ?

If this was an implementation error, the mixing of real4 or real8 results would have been very evident at the time. This is not an error that could have been missed some 4 decades ago, unlike mixing of different integer kind results which was/is less obvious.

I thought the code was obvious, but the point is that with some BLAS libraries (such as the Apple accelerate library) you must declare the type of sdot() and snrm2() incorrectly in order to get the right numerical result.

In contrast, with a correct library then you get the correct results when they are declared correctly, and the expected incorrect results when they are declared incorrectly.

My original post showed examples of both.

If this was an implementation error, the mixing of real*4 or real*8 results would have been very evident at the time. This is not an error that could have been missed some 4 decades ago, unlike mixing of different integer kind results which was/is less obvious.

I do think this is a common BLAS library error. It is not just the Apple accelerate library. Also, just to be clear, this is not a fortran error. Any library built from the fortran source code will be correct. This is a C language error, one that has propagated now for several decades.

Is there a single instance of a widely distributed BLAS object code library in which calls to BLAS routines with correctly declared and passed arguments produce incorrect results, other than on Apple (from which I have stayed away since the mid 1990s)?

I use BLAS almost every day, without having to ask/think whether the library was built from sources in C, Fortran or assembler, just as I use electric AC supply without having to think about which direction the electrons are flowing. I have used BLAS on MSDOS, Windows, Linux, AT&T SysV on a 3B2, AIX, Solaris-x86, Next,FreeBSD and on a PPC 601 Powermac with MacOS 7.x. I never discovered a BLAS error. Of course, BLAS/LAPACK have had errors that others noticed and reported and got the bugs fixed, but such bugs are very rare, and most of us may never encounter a single error while using BLAS routines.

I do remember that the AT&T f77 and Pcc compilers on the 3B2 provided flags to enable building mixed-language programs with a mix of float and double type arguments.

Sorry, that is unacceptable – if you spread such code replete with deliberate errors to others, are you not punishing non-Apple users of your code for Apple’s misdeeds?

But then, there may be a happy ending to this story. Perhaps you have found the long-sought solution to the “detect and disable Accelerate” problem. On the Numpy Github, I found one post with the following:

… we have not figured out how to disable or even identify that Accelerate is being used. That PR did not achieve its purpose, so was closed. I am a bit sceptical that a warning or note in the documentation will prevent use of Accelerate, we need a more fool-proof way (until the world invents better fools) of blocking it.

One thing we did not try is creating a test that would reliably identify a buggy lapack implementation, run that test at import, and raise an error if it fails. If the test is fast and doesn’t use much memory we may be able to justify it.

Your test code would certainly qualify for “is fast and doesn’t use much memory”.

Again, I wish to stress that until today I had not known about the existence of this problem. As far as I am aware, this problem is specific to certain Apple hardware-software combinations.

1 Like

I attempted to reproduce the BLAS+F2C bug on Windows 11-64, with some success. I could not use RonShepard’s F90 source as input to F2C, and I did not want to cloud the question by using F77 code with incorrect type declarations for BLAS routine arguments or BLAS function return values. Here is my F77 test code:

      program test_blas
      implicit none
      real dot, snorm, sdot, snrm2, x(3), y(3)
      character*20 cfmt
      data x/1.0, 2.0, 3.0/, y /3.0, 0.0, 4.0/
      data cfmt/'(2(a6,f10.3))'/

      dot = sdot(3,x,1,y,1)
      snorm = snrm2(3,y,1)
      write(*,cfmt) ' sdot=', dot, ' snrm2=', snorm

      end

Just now, I downloaded the current F2C sources from Netlib and built f2c.exe and vcf2c.lib using “Microsoft (R) C/C++ Optimizing Compiler Version 19.33.31629 for x64”. The above F77 source, converted using the freshly built f2c.exe, gave this C source:

/* rssp.f -- translated by f2c (version 20200916).
   You must link the resulting object file with libf2c:
	on Microsoft Windows system, link with libf2c.lib;
	on Linux or Unix systems, link with .../path/to/libf2c.a -lm
	or, if you install libf2c.a in a standard place, with -lf2c -lm
	-- in that order, at the end of the command line, as in
		cc *.o -lf2c -lm
	Source for libf2c is in /netlib/f2c/libf2c.zip, e.g.,

		http://www.netlib.org/f2c/libf2c.zip
*/

#include "f2c.h"

/* Table of constant values */

static integer c__3 = 3;
static integer c__1 = 1;

/* Main program */ int MAIN__(void)
{
    /* Initialized data */

    static real x[3] = { 1.f,2.f,3.f };
    static real y[3] = { 3.f,0.f,4.f };
    static char cfmt[20] = "(2(a6,f10.3))       ";

    /* Builtin functions */
    integer s_wsfe(cilist *), do_fio(integer *, char *, ftnlen), e_wsfe(void);

    /* Local variables */
    static real dot;
    extern doublereal sdot_(integer *, real *, integer *, real *, integer *), 
	    snrm2_(integer *, real *, integer *);
    static real snorm;

    /* Fortran I/O blocks */
    static cilist io___6 = { 0, 6, 0, cfmt, 0 };


    dot = sdot_(&c__3, x, &c__1, y, &c__1);
    snorm = snrm2_(&c__3, y, &c__1);
    s_wsfe(&io___6);
    do_fio(&c__1, " sdot=", (ftnlen)6);
    do_fio(&c__1, (char *)&dot, (ftnlen)sizeof(real));
    do_fio(&c__1, " snrm2=", (ftnlen)7);
    do_fio(&c__1, (char *)&snorm, (ftnlen)sizeof(real));
    e_wsfe();
    return 0;
} /* MAIN__ */

/* Main program alias */ int test_blas__ () { MAIN__ (); return 0; }

The bug is brazen – the line with doublereal. With this bug left unfixed, the output from the C code is wrong:

 sdot=      .000 snrm2    25.000

Replacing “doublereal” by “real” in the F2C-generated C file fixes the bug, and the output becomes

 sdot=    15.000 snrm2     5.000

In this reconstructed scenario, then, the culprit is F2C. Given that F2C is obsolete, we should not think of fixing the F2C sources to address this problem. Whoever (Apple, in 2022?) uses/used F2C and a C compiler to build Accelerate needs to do the right thing – recompile BLAS, Lapack, etc., from Fortran sources or fix their F2C-converted C sources. Furthermore, at least in this reconstructed scenario, the conclusion “This is a C language error, one that has propagated now for several decades.” does not apply.

P.S. Line 81 of file main.c of the F2C converter sources contains:

int forcedouble = YES;		/* force real functions to double */

Changing YES to NO on this line and rebuilding f2c.exe, I found that this change removes the problem with the test code, produces the correct type declarations, and the resulting C code when built produces the correct result. Of course, there is no telling what things break with such a change to a very complicated body of code such as that of F2C.

P.P.S. Many of us do not care to read old manuals carefully. I’m guilty as well. Here are a few lines from the the third page of the F2C document:

A function of type integer, logical, or double precision must be declared as a C function that returns the corresponding type. If the -R option is in effect (see Appendix B), the same is true of a function of type real; otherwise, a real function must be declared as a C function that returns doublereal; this hack facilitates our VAX regression testing, as it duplicates the behavior of our local Fortran compiler (f77).

Thus, to cause F2C to produce a C function code whose return variable is float from Fortran 77 source of a function of type real, one should use the -R option of F2C. Not being aware of that flag is easier to excuse than AT&T’s making the default behavior of F2C to be to deliver an unwanted change of the return type of a single precision real function in Fortran to a double precision function in C.

4 Likes

I remember using a BLAS library in the late 1980s or early 1990s on SunOS that had this same problem. I’m pretty sure that it had been converted from the original fortran source code with f2c and then tuned and optimized. I don’t know the original source, maybe netlib? At that time, SunOS defined the standard within the unix world, and all of the other unix vendors mimicked whatever was done on SunOS systems. If you compiled the BLAS with the sun fortran compiler, everything was fine, it was only the f2c version that had the error.

As I said before, this problem is not just on Apple libraries, it has been around for decades. As to why it has not been corrected, I don’t know. I would point out however that if you compile both the BLAS library and your application code with f2c, then you get consistent results. That is, f2c makes the same mistake in both places, and two wrongs make a right (sort of). At least if you don’t look closely, the results look right. So when running the LAPACK validation tests, both the BLAS library and the LAPACK library would have been converted incorrectly with f2c, but the end result would have looked alright.

In fortran codes that use external libraries, you sometimes see conditional compilation directives around the sdot() and snrm2() declarations to account for this common library bug.

#ifdef SBLASBUG
      REAL*8 SDOT
#else
      REAL SDOT
#endif

Regarding BLAS/LAPACK errors, those are a different matter. This is not a BLAS error, nor a LAPACK error, in the fortran source code. This is an error caused by the conversion to C, along with the C convention of converting float arguments to double. This was still at a time when the C language did not define the value of the integer expression i/j when one of the operands was negative (it could either round up or down), and when the C language did not respect the use of () to group subexpressions (it could rearrange things however it wanted). So C had many problems at that time with numerical calculations. It took several decades but most of these C issues have been addressed now, but that was the state of computing in the 1980s.

I think most people use the double precision ddot() and dnrm2() BLAS functions, not the real functions. Maybe that is another reason why this bug has persisted for so long.

1 Like

Surprisingly, here is the result on arm64:

$ uname -m
arm64
$ gfortran -framework accelerate test_blas.f90 && a.out
wp kind=4 sdot=.000 snrm2=.000
wp kind=8 sdot=15.000 snrm2=5.000

So the error is still there.

1 Like

Well, you have established that even on ARM64 the Accelerate library is broken. My shortened version of your test code, which is standard Fortran 77, will also fail to work correctly with Accelerate.

The first block of your code is standard Fortran, with the correct types in the declarations. Apple M-x users should take my example or your test code (with only the first block retained) to Apple and demand that the Accelerate framework be fixed without further delay. A new buyer of an Apple desktop in 2024 would not take a story about good old F2C as adequate excuse in 2023.

Older posts elsewhere on the same theme:

  1. On PasteBin
  2. Ugly hacks to accommodate a bug in Accelerate’s SGEMV
  3. Bug in cblas_gemm, reported in Caffe users group.

The third case is noteworthy since the code is in C, and the called routine is in CBLAS. Unless some parts of Accelerate were built from Fortran sources, Fortran is not even in the picture, but the real/double bug is present.

Based on my interaction with one of the LAPACK/OpenBLAS maintainers in the lsame refactoring GitHub issue, it appears at least some Android developers are relying on F2C for their BLAS/LAPACK.

Unrelated to Android, searching on Stack Overflow I was able to find a few cases which look like they might be related (I didn’t dig deeper):

The first case (on Stack Overflow) is the same as the SNRM2 error discussed above. It occurs when Accelerate is used, and goes away when OpenBLAS is used.

The second case is unrelated; it is specific to Cuda C and Cuda BLAS, and the code fragments given were not sufficient to reproduce the reported bug.

2 Likes

If you read the first case more carefully, it did not use -framework accelerate in the link line, it used -lblas which is the OpenBLAS library installed through Homebrew. It was one of the responses to that first post that showed that the error was also in -framework accelerate.

OpenBLAS is based on GotoBLAS, which is written in C, not fortran. If you look at the source code, the return type is called FLOATRET which is defined as

#if defined(XDOUBLE) || defined(DOUBLE)
#define FLOATRET        FLOAT
#else
#ifdef NEED_F2CCONV
#define FLOATRET        double
#else
#define FLOATRET        float
#endif
#endif

So this convention of returning fortran REAL values as fortran DOUBLE PRECISION values was already part of the problem even in the GotoBLAS era. I have not looked at the OpenBLAS source code yet, but I expect it followed on from the earlier established C conventions.

This is a more widespread problem than just the accelerate library, and as far as I know about it, the errors all trace back one way or another to the f2c conversion and to the way C promotes float arguments to double.

1 Like

I skimmed through some Numpy forum posts and got the impression that on Macs without a separate BLAS installed and configured (through “site.cfg”?) the use of -lblas in a linking command caused Accelerate to be used through a symlink.

What I don’t understand is why the -R flag of f2c was not made the default, given that AT&T’s own F2C documentation states that the problematic conversion of real to double was a “hack” used to facilitate regression testing on a VAX.

I checked the current OpenBLAS source code, and that exact same set of conditional definitions of the return type is still there. Look in common.h.

I have decided to submit a bug report to Apple feedback about this issue, and I want to make sure I have correct source code. Here is my current version:

program sblas
   ! test some single-precision blas results.
   implicit none
   real :: x(2)=[3.,4.], y(2)=[1.,1.]
   complex :: w(2)=[(4.,3.),(3.,4.)], z(2)=[(5.,6.),(7.,8.)]
   real, external :: sdot, sdsdot, snrm2, scnrm2, sasum, scasum
   complex, external :: cdotu, cdotc
   character(*), parameter :: cfmt='(*(g0.4,1x))'
   write(*,cfmt) 'sdot=',   sdot(2,x,1,y,1),       'should be 7.000'
   write(*,cfmt) 'sdsdot=', sdsdot(2,0.0,x,1,y,1), 'should be 7.000'
   write(*,cfmt) 'snrm2=',  snrm2(2,x,1),          'should be 5.000'
   write(*,cfmt) 'scnrm2=', scnrm2(2,w,1),         'should be 7.071'
   write(*,cfmt) 'sasum=',  sasum(2,x,1),          'should be 7.000'
   write(*,cfmt) 'scasum=', scasum(2,w,1),         'should be 14.00'
   write(*,cfmt) 'cdotu=',  cdotu(2,w,1,z,1),      'should be -9.000 91.00'
   write(*,cfmt) 'cdotc=',  cdotc(2,w,1,z,1),      'should be 91.00 5.000'
end program sblas

Here are some correct results using the intel MKL library:

$ ifort -L${MKLROOT}/lib -Wl,-rpath,${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread sblas.f90 && a.out
sdot= 7.000 should be 7.000
sdsdot= 7.000 should be 7.000
snrm2= 5.000 should be 5.000
scnrm2= 7.071 should be 7.071
sasum= 7.000 should be 7.000
scasum= 14.00 should be 14.00
cdotu= -9.000 91.00 should be -9.000 91.00
cdotc= 91.00 5.000 should be 91.00 5.000

Here are some incorrect and some correct results using -framework accelerate:

$ ifort -framework accelerate sblas.f90 && a.out
sdot= .000 should be 7.000
sdsdot= .000 should be 7.000
snrm2= .000 should be 5.000
scnrm2= .000 should be 7.071
sasum= .000 should be 7.000
scasum= .000 should be 14.00
cdotu= -9.000 91.00 should be -9.000 91.00
cdotc= 91.00 5.000 should be 91.00 5.000

Here are some entirely incorrect results with gfortran and -framework accelerate:

$ gfortran -framework accelerate sblas.f90 && a.out
sdot= 0.000 should be 7.000
sdsdot= 0.000 should be 7.000
snrm2= 0.000 should be 5.000
scnrm2= 0.000 should be 7.071
sasum= 0.000 should be 7.000
scasum= 0.000 should be 14.00

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x10af498be
#1  0x10af48a9d
#2  0x7fff207ced7c
#3  0x7fff2105dfc8
#4  0x10af37bc1
#5  0x10af37d7e
Segmentation fault: 11

If I could get some verification from the people here that the fortran code is correct and that the expected results are correct, then I will submit this info to Apple.

The program works correctly on Windows with the current version of Intel Fortran and on Cygwin64 with gfortran 11.3.

Some compilers/runtimes may object to the use of g0.d for character output list items.

1 Like

When I run on macOS Mojave this “gfc -lblas sblas.f90 && ./a.out” I also saw a segfault.
With a custom installation of LAPACK (3.9.0) I’ve to link with the (confusing) “-lrefblas” and then I see:

gfc -lrefblas sblas.f90 && ./a.out
sdot= 7.000 should be 7.000
sdsdot= 7.000 should be 7.000
snrm2= 5.000 should be 5.000
scnrm2= 7.071 should be 7.071
sasum= 7.000 should be 7.000
scasum= 14.00 should be 14.00
cdotu= -9.000 91.00 should be -9.000 91.00
cdotc= 91.00 5.000 should be 91.00 5.000

gfc -lblas test_blas.f90 && ./a.out
wp kind=4 sdot=.000 snrm2=.000
wp kind=8 sdot=15.000 snrm2=5.000

gfc -lrefblas test_blas.f90 && ./a.out
wp kind=4 sdot=15.000 snrm2=5.000
wp kind=8 sdot=.000 snrm2=.000

Darwin 18.7.0