Help in fixing the bug regarding xdp kind when used in a unique function

Hi everyone,
I’m working on an implementation for a unique function inside stdlib which returns unique values in an array. I implemented it and while testing, I am getting a bug for xdp kinds.
I am giving a working skeleton of the function below where the bug occurs.

program example_unique
  use stdlib_kinds, only: xdp, int8
  use stdlib_constants, only: zero_xdp
  use stdlib_hashmaps, only: chaining_hashmap_type
  use stdlib_hashmap_wrappers, only: key_type
  implicit none

  block
    real(xdp), allocatable :: array(:), result(:)
    integer(int8), allocatable :: bytes(:)

    allocate(array(5), source=zero_xdp)
    array = [5.0_xdp, 4.0_xdp, 3.0_xdp, -1.0_xdp, 3.0_xdp, 5.0_xdp]
    result = unique(array)

    !-----------------------------------------!
    ! print lines to see the byte structure of array
    allocate(bytes(storage_size(array(1))/8))
    bytes = transfer(array(1), bytes)
    print*, "array(1): ", bytes
    bytes = transfer(array(2), bytes)
    print*, "array(2): ", bytes
    bytes = transfer(array(3), bytes)
    print*, "array(3): ", bytes
    bytes = transfer(array(4), bytes)
    print*, "array(4): ", bytes
    !-----------------------------------------!

    print*, "array: ", array
    print*, "result: ", result
    deallocate(array)
  end block
  block
    real(xdp), allocatable :: array(:), result(:)
    integer(int8), allocatable :: bytes(:)

    allocate(array(4), source=zero_xdp)
    array = [4.0_xdp, 4.0_xdp, 4.0_xdp, 4.0_xdp]
    result = unique(array)

    !-----------------------------------------!
    ! print lines to see the byte structure of array
    print*
    bytes = transfer(array(1), bytes)
    print*, "array(1): ", bytes
    bytes = transfer(array(2), bytes)
    print*, "array(2): ", bytes
    bytes = transfer(array(3), bytes)
    print*, "array(3): ", bytes
    bytes = transfer(array(4), bytes)
    print*, "array(4): ", bytes
    !-----------------------------------------!

    print*, "array: ", array
    print*, "result: ", result
    deallocate(array)
  end block

contains

  function unique(A) result(output)
    implicit none
    real(xdp), intent(in) :: A(:)
    real(xdp), allocatable :: output(:)

    real(xdp), allocatable:: temp(:)

    if(size(A) == 0) then
        allocate(output(0))
        return
    end if
    allocate(temp, source=A)
    output = xdp_stable_unique(temp)
  end function unique

  function xdp_stable_unique(temp) result(output)
    real(xdp), intent(in) :: temp(:)
    real(xdp), allocatable :: output(:)

    type(chaining_hashmap_type) :: map
    logical, allocatable :: mask(:)
    logical :: present
    integer :: i
    integer(int8) :: key(storage_size(temp(1))/8)

    call map%init()
    allocate(mask(size(temp)))
    do i = 1, size(temp)
        key = 0
        key = transfer(temp(i), key)
        call map%key_test(key, present)
        if (.not. present) then
            call map%map_entry(key)
            mask(i) = .true.
        else
            mask(i) = .false.
        end if
    end do
    output = pack(temp, mask)
    deallocate(mask)
  end function
end program example_unique

I used gfortran to compile and it gave me this output:

 array(1):     0    0    0    0    0    0    0  -96    1   64    0    0    0    0    0    0
 array(2):     0    0    0    0    0    0    0 -128    1   64    0    0    0    0    0    0
 array(3):     0    0    0    0    0    0    0  -64    0   64    0    0    0    0    0    0
 array(4):     0    0    0    0    0    0    0 -128   -1  -65    0    0    0    0    0    0
 array:    5.00000000000000000000         4.00000000000000000000         3.00000000000000000000        -1.00000000000000000000         3.00000000000000000000         5.00000000000000000000      
 result:    5.00000000000000000000         4.00000000000000000000         3.00000000000000000000        -1.00000000000000000000      

 array(1):     0    0    0    0    0    0    0 -128    1   64    0    0    0    0    0    0
 array(2):     0    0    0    0    0    0    0 -128    1   64  111  104  -58   91    0    0
 array(3):     0    0    0    0    0    0    0 -128    1   64    0    0    0    0    0    0
 array(4):     0    0    0    0    0    0    0 -128    1   64    0    0    0    0    0    0
 array:    4.00000000000000000000         4.00000000000000000000         4.00000000000000000000         4.00000000000000000000      
 result:    4.00000000000000000000         4.00000000000000000000   

As you can see, in the first part, it gave me the output correctly, but in the second part, the byte representation for the second entry inside array is different from all the other entries and because of this different byte structure, unique function gave two values inside the result. Even though I am assigning the entries for array the second time normallly(as array = [4.0_xdp, 4.0_xdp, 4.0_xdp, 4.0_xdp]),I have no clue why the byte structure differs for the second entry.

Two things I noticed:

  1. If I remove the first call to unique in line 14 then the bug disappears and correct byte structure comes.
  2. If I remove the second call to unique in line 39 then the bug becomes worse and two entries will have different byte structures.

Can anyone give me some insights about why this is happening and what can we do in this situation ? Any help would be greatly appreciated.

Don’t know if this helps but this is my version of a unique routine I wrote to mimic MATLAB’s unique routine. Note it depends on the parent array being sorted so I use a quicksort routine if the array is not already sorted (as specified by user). It replicates the results for the MATLAB function examples as defined in the MATLAB set operations online documentation. Note I compare floats to a tolerance that the user can change. Also, just curious as to why you are using hashmaps. This appears to be overkill to me.

Code for REAL64

  Subroutine unique_r64(a, ua, sorted, user_tol)

    Real(REAL64),              Intent(INOUT) :: a(:)
    Real(REAL64), ALLOCATABLE, Intent(OUT)   :: ua(:)
    Logical,      OPTIONAL,    Intent(IN)    :: sorted
    Real(REAL64), OPTIONAL,    Intent(IN)    :: user_tol

    Real(REAL64) :: tol
    Integer      :: i, na
    Logical      :: sort_array

    Integer(INT8), ALLOCATABLE :: mask(:)

    na         = SIZE(a)
    sort_array = .TRUE.

    tol        = 1.0E-12_REAL64
    If (PRESENT(user_tol)) tol = user_tol

    ALLOCATE(mask(na))
    mask = 0_INT8

    If (PRESENT(sorted)) Then
      If (sorted) sort_array = .FALSE.
    End If

    If (sort_array) Then
      Call sort(a)
    End If

    Do i=1,na-1
      If (ABS(a(i+1)-a(i)) <= tol) mask(i) = 1_INT8
    End Do

    ua = PACK(a, MASK=[mask == 0_INT8])

  End Subroutine unique_r64

I ran into this the other month when hashing the raw bytes of real(xdp) and not comparing the values. xdp = selected_real_kind(18) maps to extended precision with padding, two equal xdp values can have different padding bytes, so transfer(temp(i), key) doesn’t really work for a unique function. For futurexdp is defined at stdlib_kinds.fypp:19 and how raw int8 keys are used in stdlib_hashmap_wrappers.f90:255 and stdlib_hashmaps.f90:941. You should just be able to hash xdp by numeric value and not by transfer bytes.