LFortran now supports all intrinsic functions

If the community is interested I can move the implementations of sum and dot_product in fast_math to stdlib. From the tests in the library, this reduces the choices to two: fastest (chunked), accurate (chunked kahan). Even the kahan one can be faster than the intrinsic (according to the test I’ve done with gfortran 13; with intel is slower)

2 Likes

I’ve opened a draft PR in stdlib following the discussion in this thread. For the moment I just included fsum/fsum_kahan and fprod/fprod_kahan. I poked a little bit the fast_math library before starting the PR, see some updated results.

In order to avoid hijacking this thread, I propose to follow up the discussion directly on the PR. Or a new thread about instrinsics in stdlib would be more appropriate? @certik I feel this is also related to some of your comments here Building a Fortran compiler in pure Fortran (Mega thread) - #11 by certik

3 Likes

@hkvzjal excellent, thanks!

Regarding my comment that you linked, we since found it hard to maintain functions like “sum” in Fortran, because it has the optional “dim” argument, and if it is compile-time, you want to inline the loops with minimal runtime overhead. Also it has to work for many types and kinds and ranks. So we currently implemented such functions inside the compiler itself, effectively instantiating them for a particular type/kind/rank and a compile-time (if available) value of “dim”. See the above blog post for more details.

Being able to maintain them in Fortran would be a great use case for the generics subgroup. :slight_smile:

2 Likes

Thanks for your feedback @certik!

So, If I were to push the current PR to handle ND arrays and the issue with dim, it could be done in stdlib thanks to fypp, here a draft that I’m starting:

header module interface for fsum
    interface fsum 
        #:for rk, rt, rs in RC_KINDS_TYPES
        pure module function fsum_1d_${rs}$(a) result(s)
            ${rt}$, intent(in) :: a(:)
            ${rt}$ :: s
        end function
        pure module function fsum_1d_${rs}$_mask(a,mask) result(s)
            ${rt}$, intent(in) :: a(:)
            logical, intent(in) :: mask(:)
            ${rt}$ :: s
        end function
        #:for rank in RANKS
        pure module function fsum_${rank}$d_${rs}$( x, mask ) result( s )
            ${rt}$, intent(in) :: x${ranksuffix(rank)}$
            logical, intent(in), optional :: mask${ranksuffix(rank)}$
            ${rt}$ :: s
        end function
        pure module function fsum_${rank}$d_dim_${rs}$( x , dim, mask ) result( s )
            ${rt}$, intent(in) :: x${ranksuffix(rank)}$
            integer, intent(in):: dim
            logical, intent(in), optional :: mask${ranksuffix(rank)}$
            ${rt}$ :: s${reduced_shape('x', rank, 'dim')}$
        end function
        #:endfor
        #:endfor
    end interface
    public :: fsum
submodule implementation details
#:for rk, rt, rs in RC_KINDS_TYPES
pure module function fsum_1d_${rs}$(a) result(s)
    ${rt}$, intent(in) :: a(:)
    ${rt}$ :: s
    ${rt}$ :: abatch(chunk)
    integer :: i, dr, rr
    ! -----------------------------
    dr = size(a)/chunk
    rr = size(a) - dr*chunk

    abatch = zero_${rs}$
    do i = 1, dr
      abatch(1:chunk) = abatch(1:chunk) + a(chunk*i-chunk+1:chunk*i)
    end do
    abatch(1:rr) = abatch(1:rr) + a(size(a)-rr+1:size(a))

    s = zero_${rs}$
    do i = 1, chunk/2
      s = s + abatch(i)+abatch(chunk/2+i)
    end do
end function

pure module function fsum_1d_${rs}$_mask(a,mask) result(s)
    ${rt}$, intent(in) :: a(:)
    logical, intent(in) :: mask(:)
    ${rt}$ :: s
    ${rt}$ :: abatch(chunk)
    integer :: i, dr, rr
    ! -----------------------------
    dr = size(a)/chunk
    rr = size(a) - dr*chunk

    abatch = zero_${rs}$
    do i = 1, dr
      abatch(1:chunk) = abatch(1:chunk) + merge( zero_${rs}$ , a(chunk*i-chunk+1:chunk*i) , mask(chunk*i-chunk+1:chunk*i) )
    end do
    abatch(1:rr) = abatch(1:rr) + merge( zero_${rs}$ , a(size(a)-rr+1:size(a)) , mask(size(a)-rr+1:size(a)) )
    
    s = zero_${rs}$
    do i = 1, chunk/2
        s = s + abatch(i)+abatch(chunk/2+i)
    end do
end function

!============ N-D =================
#:for rank in RANKS
pure module function fsum_${rank}$d_${rs}$( x , mask ) result( s )
    ${rt}$, intent(in) :: x${ranksuffix(rank)}$
    logical, intent(in), optional :: mask${ranksuffix(rank)}$
    ${rt}$ :: s
    if(.not.present(mask)) then
        s = sum_recast(x,size(x))
    else
        s = sum_recast_mask(x,mask,size(x))
    end if
contains
    pure ${rt}$ function sum_recast(b,n)
      integer, intent(in) :: n
      ${rt}$, intent(in) :: b(n)
      sum_recast = fsum(b)
    end function
    pure ${rt}$ function sum_recast_mask(b,m,n)
      integer, intent(in) :: n
      ${rt}$, intent(in) :: b(n)
      logical, intent(in) :: m(n)
      sum_recast_mask = fsum(b,m)
    end function
end function

pure module function fsum_${rank}$d_dim_${rs}$( x , dim, mask ) result( s )
    ${rt}$, intent(in) :: x${ranksuffix(rank)}$
    integer, intent(in):: dim
    logical, intent(in), optional :: mask${ranksuffix(rank)}$
    ${rt}$ :: s${reduced_shape('x', rank, 'dim')}$
    integer :: j 

    if(.not.present(mask)) then
        if(dim<${rank}$)then
            do j = 1, size(x,dim=${rank}$)
                #:if rank == 2
                s${select_subarray(rank-1, [(rank-1, 'j')])}$ = fsum( x${select_subarray(rank, [(rank, 'j')])}$ )
                #:else
                s${select_subarray(rank-1, [(rank-1, 'j')])}$ = fsum( x${select_subarray(rank, [(rank, 'j')])}$, dim=dim )
                #:endif
            end do
        else
            do j = 1, size(x,dim=1)
                #:if rank == 2
                s${select_subarray(rank-1, [(1, 'j')])}$ = fsum( x${select_subarray(rank, [(1, 'j')])}$ )
                #:else
                s${select_subarray(rank-1, [(1, 'j')])}$ = fsum( x${select_subarray(rank, [(1, 'j')])}$, dim=${rank-1}$ )
                #:endif
            end do
        end if
    else 
        if(dim<${rank}$)then
            do j = 1, size(x,dim=${rank}$)
                #:if rank == 2
                s${select_subarray(rank-1, [(rank-1, 'j')])}$ = fsum( x${select_subarray(rank, [(rank, 'j')])}$, mask=mask${select_subarray(rank, [(rank, 'j')])}$ )
                #:else
                s${select_subarray(rank-1, [(rank-1, 'j')])}$ = fsum( x${select_subarray(rank, [(rank, 'j')])}$, dim=dim, mask=mask${select_subarray(rank, [(rank, 'j')])}$ )
                #:endif
            end do
        else
            do j = 1, size(x,dim=1)
                #:if rank == 2
                s${select_subarray(rank-1, [(1, 'j')])}$ = fsum( x${select_subarray(rank, [(1, 'j')])}$, mask=mask${select_subarray(rank, [(1, 'j')])}$ )
                #:else
                s${select_subarray(rank-1, [(1, 'j')])}$ = fsum( x${select_subarray(rank, [(1, 'j')])}$, dim=${rank-1}$, mask=mask${select_subarray(rank, [(1, 'j')])}$ )
                #:endif
            end do
        end if
    end if

end function
#:endfor
#:endfor

Should I try to push it here? the fsum implementation has proven very interesting for handling contigous 1D arrays, I have no idea how it might behave performance-wise for ND arrays when requesting to sum over dim>1.

EDIT: in the implementation details, it can handle dim and mask for all ranks

3 Likes

Yes, send a PR, and we’ll go from there.

1 Like