LFortran now supports all intrinsic functions

hkvzjal · November 26, 2024, 6:12am

If the community is interested I can move the implementations of sum and dot_product in fast_math to stdlib. From the tests in the library, this reduces the choices to two: fastest (chunked), accurate (chunked kahan). Even the kahan one can be faster than the intrinsic (according to the test I’ve done with gfortran 13; with intel is slower)

hkvzjal · January 4, 2025, 6:54pm

I’ve opened a draft PR in stdlib following the discussion in this thread. For the moment I just included fsum/fsum_kahan and fprod/fprod_kahan. I poked a little bit the fast_math library before starting the PR, see some updated results.

In order to avoid hijacking this thread, I propose to follow up the discussion directly on the PR. Or a new thread about instrinsics in stdlib would be more appropriate? @certik I feel this is also related to some of your comments here Building a Fortran compiler in pure Fortran (Mega thread) - #11 by certik

certik · January 4, 2025, 8:39pm

@hkvzjal excellent, thanks!

Regarding my comment that you linked, we since found it hard to maintain functions like “sum” in Fortran, because it has the optional “dim” argument, and if it is compile-time, you want to inline the loops with minimal runtime overhead. Also it has to work for many types and kinds and ranks. So we currently implemented such functions inside the compiler itself, effectively instantiating them for a particular type/kind/rank and a compile-time (if available) value of “dim”. See the above blog post for more details.

Being able to maintain them in Fortran would be a great use case for the generics subgroup.

hkvzjal · January 5, 2025, 1:43pm

Thanks for your feedback @certik!

So, If I were to push the current PR to handle ND arrays and the issue with dim, it could be done in stdlib thanks to fypp, here a draft that I’m starting:

header module interface for fsum

    interface fsum 
        #:for rk, rt, rs in RC_KINDS_TYPES
        pure module function fsum_1d_${rs}$(a) result(s)
            ${rt}$, intent(in) :: a(:)
            ${rt}$ :: s
        end function
        pure module function fsum_1d_${rs}$_mask(a,mask) result(s)
            ${rt}$, intent(in) :: a(:)
            logical, intent(in) :: mask(:)
            ${rt}$ :: s
        end function
        #:for rank in RANKS
        pure module function fsum_${rank}$d_${rs}$( x, mask ) result( s )
            ${rt}$, intent(in) :: x${ranksuffix(rank)}$
            logical, intent(in), optional :: mask${ranksuffix(rank)}$
            ${rt}$ :: s
        end function
        pure module function fsum_${rank}$d_dim_${rs}$( x , dim, mask ) result( s )
            ${rt}$, intent(in) :: x${ranksuffix(rank)}$
            integer, intent(in):: dim
            logical, intent(in), optional :: mask${ranksuffix(rank)}$
            ${rt}$ :: s${reduced_shape('x', rank, 'dim')}$
        end function
        #:endfor
        #:endfor
    end interface
    public :: fsum

submodule implementation details

#:for rk, rt, rs in RC_KINDS_TYPES
pure module function fsum_1d_${rs}$(a) result(s)
    ${rt}$, intent(in) :: a(:)
    ${rt}$ :: s
    ${rt}$ :: abatch(chunk)
    integer :: i, dr, rr
    ! -----------------------------
    dr = size(a)/chunk
    rr = size(a) - dr*chunk

    abatch = zero_${rs}$
    do i = 1, dr
      abatch(1:chunk) = abatch(1:chunk) + a(chunk*i-chunk+1:chunk*i)
    end do
    abatch(1:rr) = abatch(1:rr) + a(size(a)-rr+1:size(a))

    s = zero_${rs}$
    do i = 1, chunk/2
      s = s + abatch(i)+abatch(chunk/2+i)
    end do
end function

pure module function fsum_1d_${rs}$_mask(a,mask) result(s)
    ${rt}$, intent(in) :: a(:)
    logical, intent(in) :: mask(:)
    ${rt}$ :: s
    ${rt}$ :: abatch(chunk)
    integer :: i, dr, rr
    ! -----------------------------
    dr = size(a)/chunk
    rr = size(a) - dr*chunk

    abatch = zero_${rs}$
    do i = 1, dr
      abatch(1:chunk) = abatch(1:chunk) + merge( zero_${rs}$ , a(chunk*i-chunk+1:chunk*i) , mask(chunk*i-chunk+1:chunk*i) )
    end do
    abatch(1:rr) = abatch(1:rr) + merge( zero_${rs}$ , a(size(a)-rr+1:size(a)) , mask(size(a)-rr+1:size(a)) )
    
    s = zero_${rs}$
    do i = 1, chunk/2
        s = s + abatch(i)+abatch(chunk/2+i)
    end do
end function

!============ N-D =================
#:for rank in RANKS
pure module function fsum_${rank}$d_${rs}$( x , mask ) result( s )
    ${rt}$, intent(in) :: x${ranksuffix(rank)}$
    logical, intent(in), optional :: mask${ranksuffix(rank)}$
    ${rt}$ :: s
    if(.not.present(mask)) then
        s = sum_recast(x,size(x))
    else
        s = sum_recast_mask(x,mask,size(x))
    end if
contains
    pure ${rt}$ function sum_recast(b,n)
      integer, intent(in) :: n
      ${rt}$, intent(in) :: b(n)
      sum_recast = fsum(b)
    end function
    pure ${rt}$ function sum_recast_mask(b,m,n)
      integer, intent(in) :: n
      ${rt}$, intent(in) :: b(n)
      logical, intent(in) :: m(n)
      sum_recast_mask = fsum(b,m)
    end function
end function

pure module function fsum_${rank}$d_dim_${rs}$( x , dim, mask ) result( s )
    ${rt}$, intent(in) :: x${ranksuffix(rank)}$
    integer, intent(in):: dim
    logical, intent(in), optional :: mask${ranksuffix(rank)}$
    ${rt}$ :: s${reduced_shape('x', rank, 'dim')}$
    integer :: j 

    if(.not.present(mask)) then
        if(dim<${rank}$)then
            do j = 1, size(x,dim=${rank}$)
                #:if rank == 2
                s${select_subarray(rank-1, [(rank-1, 'j')])}$ = fsum( x${select_subarray(rank, [(rank, 'j')])}$ )
                #:else
                s${select_subarray(rank-1, [(rank-1, 'j')])}$ = fsum( x${select_subarray(rank, [(rank, 'j')])}$, dim=dim )
                #:endif
            end do
        else
            do j = 1, size(x,dim=1)
                #:if rank == 2
                s${select_subarray(rank-1, [(1, 'j')])}$ = fsum( x${select_subarray(rank, [(1, 'j')])}$ )
                #:else
                s${select_subarray(rank-1, [(1, 'j')])}$ = fsum( x${select_subarray(rank, [(1, 'j')])}$, dim=${rank-1}$ )
                #:endif
            end do
        end if
    else 
        if(dim<${rank}$)then
            do j = 1, size(x,dim=${rank}$)
                #:if rank == 2
                s${select_subarray(rank-1, [(rank-1, 'j')])}$ = fsum( x${select_subarray(rank, [(rank, 'j')])}$, mask=mask${select_subarray(rank, [(rank, 'j')])}$ )
                #:else
                s${select_subarray(rank-1, [(rank-1, 'j')])}$ = fsum( x${select_subarray(rank, [(rank, 'j')])}$, dim=dim, mask=mask${select_subarray(rank, [(rank, 'j')])}$ )
                #:endif
            end do
        else
            do j = 1, size(x,dim=1)
                #:if rank == 2
                s${select_subarray(rank-1, [(1, 'j')])}$ = fsum( x${select_subarray(rank, [(1, 'j')])}$, mask=mask${select_subarray(rank, [(1, 'j')])}$ )
                #:else
                s${select_subarray(rank-1, [(1, 'j')])}$ = fsum( x${select_subarray(rank, [(1, 'j')])}$, dim=${rank-1}$, mask=mask${select_subarray(rank, [(1, 'j')])}$ )
                #:endif
            end do
        end if
    end if

end function
#:endfor
#:endfor

Should I try to push it here? the fsum implementation has proven very interesting for handling contigous 1D arrays, I have no idea how it might behave performance-wise for ND arrays when requesting to sum over dim>1.

EDIT: in the implementation details, it can handle dim and mask for all ranks

certik · January 5, 2025, 9:27pm

Yes, send a PR, and we’ll go from there.

Topic		Replies	Views
Weekly Blog Posts for "Supporting Arrays and Allocatables in LFortran" GSoC-2021	20	2068	August 16, 2021
Applications of reduce intrinsic?	1	562	January 20, 2022
Array intrinsics performances/accuracy	20	877	May 20, 2023
Fortran Standard Library v0.6.0 Released Announcements	9	877	May 15, 2024
How to contribute to LFortran's runtime library	28	1081	November 25, 2022

LFortran now supports all intrinsic functions

Related topics