Testing LogSumExp (or RealSoftMax)

The advantage of the loop based version is that no temp arrays need to be allocated, and that is probably what is happening in the array expression. If I were concerned about performance over a variety of compilers, I would use the max_val() intrinsic in step 1, and the loop version for step 2 (this would eliminate any temp array allocations), and I would use the rsigma version of the expression (just in case a compiler might not do that automatically).

The other issue with the timings looks like the constant expression problem, which is something that is common for benchmark codes. Getting useful benchmark results often requires tricking the compiler somehow so that it does not apply what it thinks is an obvious optimization. Sometimes whole loops are eliminated, sometimes (as in this case), the loop is not eliminated but the function call within it is and is replaced with a single evaluation and multiple references to the stored constant.

1 Like

This version does everything in one loop instead of two, and it is quite faster on my Mac (removed pure thanks to @ivanpribec’s suggestion):

 Now testing speed...
 Vbar_arr  =    6.4458675156397689     
 Vbar_loop =    6.4458675156397689     
 Vbar_loop =    6.4458675156397689     

time (fun_logsum_arr)   :        2.52017300 seconds
time (fun_logsum_loops) :        2.49555300 seconds
time (fun_logsum_optim) :        2.06673700 seconds
function fun_logsum_optim2(V, sigma, n) result(LogSum)
    ! One-pass, numerically-stable log-sum-exp
    implicit none
    integer,          intent(in) :: n
    real(dp),         intent(in) :: V(n)
    real(dp),         intent(in) :: sigma
    real(dp)                      :: LogSum

    real(dp) :: rsigma, max_val, sum_exp, factor, Vrel
    integer  :: i

    rsigma  = 1.0_dp/sigma
    
    max_val = -huge(1.0_dp)
    sum_exp = 0.0_dp
    do i = 1, n
        Vrel = V(i) - max_val
        if (Vrel>0) then
            factor  = exp(-Vrel*rsigma) ! between old max_val and new max_val==V(i)
            sum_exp = sum_exp * factor + 1 ! rescale to new max_val; then, new term is exp(0)
            max_val = V(i)
        else
            sum_exp = sum_exp + exp(Vrel*rsigma)
        end if
    end do

    LogSum = max_val + sigma*log(sum_exp)
end function fun_logsum_optim2

maybe worth checking on other platforms of course. I guess the speed depends on the maxloc of V(i), that in the example, is at n (V being sorted), so, the loop basically always experiences Vrel>0

1 Like

I’m having all kinds of problems today trying to copy code from my browser window into my text editor window (e.g. missing blocks of code, extra characters). Is this because of some change here in this discussion group?

1 Like

I think this is a good idea. As a matter of esthetics, I would do the loops like this:

      max_val = V(1)
      sum_exp = 1.0_dp
      do i = 2, n

(I think this is equivalent) but I think the rest of it looks good. :slight_smile:

2 Likes