Fixing the big-endian test failures (Issue #1128)

Hi everyone,

I was looking at the recent Debian build failures for stdlib 0.8.1 on big-endian machines reported in Issue #1128, and noticed that test_hash_functions is failing. I dug into the code and found a few intertwined issues.

First, test_little_endian in test_hash_functions.f90 has an error stop if the machine isn’t little-endian. This instantly kills the whole test suite on big-endian machines.

But beyond that, there are actual endianness mismatches in the hash implementations themselves:

  1. C Reference mismatches: The Fortran tests compare against C-generated reference hashes, but the C code handles endianness inconsistently. While nmhash32 uses __builtin_bswap32 on big-endian, the C versions of waterhash, pengyhash, and SpookyV2 just use raw memcpy. Since the Fortran code tries to normalize to little-endian reads, the C references and Fortran outputs will always disagree on big-endian architectures.

  2. The 16-bit multiply bug: There’s also a subtle bug with the 16-bit multiplications. Both C and Fortran split 32-bit values into 16-bit halves. In Fortran, doing vx16 = transfer(vx32, 0_int16, 2) puts the lower 16 bits in vx16(1) on little-endian, but the upper 16 bits in vx16(1) on big-endian. Because the code multiplies these indices by fixed constant halves, it basically applies the underlying math backward on big-endian systems.

Before I start patching the math (probably by replacing transfer with bitwise shifts like shiftr and iand), I wanted to ask about the design philosophy for stdlib_hash.

Should these hashes be strictly endian-agnostic (meaning the same input bytes always produce the exact same hash everywhere)? Or is the goal to prioritize raw speed, meaning it’s fine if hashes differ across architectures?

Let me know what you guys think is the best approach for stdlib, and I’ll start working on the fix!

3 Likes