Faster rebinning subroutine

My code is limited in speed by the following rebinning subroutine. Code is also here on Github: futils/futils.f90 at main · Nicholaswogan/futils · GitHub . Suppose you have values defined on some bins. This subroutine can rebin to a new set of bins.

I’m wondering if someone knows of an implementation that is faster? or if they can spot optimizations that I could make to the current version? Thanks!

  subroutine rebin(old_bins, old_vals, new_bins, new_vals, ierr)
    real(dp), intent(in) :: old_bins(:)
    real(dp), intent(in) :: old_vals(:)
    real(dp), intent(in) :: new_bins(:)
    real(dp), intent(out) :: new_vals(:)
    integer, optional, intent(out) :: ierr
    
    integer :: i, j, l, n_old, n_new
    real(dp) :: b1, b2
    
    n_old = size(old_vals)
    n_new = size(new_vals)
    
    ! option to check inputs.
    if (present(ierr)) then
      ierr = 0
      
      ! check shape
      if (n_old+1 /= size(old_bins)) then
        ierr = -1
        return
      endif
      
      if (n_new+1 /= size(new_bins)) then
        ierr = -2
        return
      endif
      
      ! check that new bins overlap with the old bins
      if (new_bins(1) < old_bins(1) .and. new_bins(2) < old_bins(1)) then
        ierr = -3
        return
      endif
      
      if (new_bins(n_new) > old_bins(n_old+1) .and. new_bins(n_new+1) > old_bins(n_old+1)) then
        ierr = -4
        return
      endif
      
      ! check that bins are all increasing
      do i = 1,n_old
        if (old_bins(i+1) <= old_bins(i)) then
          ierr = -5
          return
        endif
      enddo
      
      do i = 1,n_new
        if (new_bins(i+1) <= new_bins(i)) then
          ierr = -6
          return
        endif
      enddo
      
    endif
    
    new_vals = 0.0_dp
    l = 1
    
    do i = 1,n_new
      b1 = new_bins(i+1) - new_bins(i)
      do j = l,n_old
        ! four different cases
        
        ! ______       (old bin)
        !     ________ (new bin)
        if (old_bins(j) <= new_bins(i) .and. &
            old_bins(j+1) > new_bins(i) .and. old_bins(j+1) < new_bins(i+1)) then
          
          b2 = old_bins(j+1) - new_bins(i)
          new_vals(i) = new_vals(i) + (b2/b1)*old_vals(j)
          
        !    ____    (old bin)
        !  ________  (new bin)
        elseif (old_bins(j) >= new_bins(i) .and. old_bins(j+1) <= new_bins(i+1)) then
          
          b2 = old_bins(j+1) - old_bins(j)
          new_vals(i) = new_vals(i) + (b2/b1)*old_vals(j)
          
        !        ____    (old bin)
        !  ________      (new bin)
        elseif (old_bins(j) >= new_bins(i) .and. &
                old_bins(j) < new_bins(i+1) .and. old_bins(j+1) > new_bins(i+1)) then
        
          b2 = new_bins(i+1) - old_bins(j)
          new_vals(i) = new_vals(i) + (b2/b1)*old_vals(j)
        
        ! ________    (old bin)
        !   ____      (new bin) 
        elseif (old_bins(j) < new_bins(i) .and. old_bins(j+1) > new_bins(i+1)) then
          
          new_vals(i) = new_vals(i) + old_vals(j)
          
        !         _____   (old bin)
        !   ____          (new bin) 
        elseif (old_bins(j) > new_bins(i+1)) then
          ! time to move onto the next new_bin
          l = j - 1
          exit
        else
          ! nothing
        endif

      enddo
    enddo
    
  end subroutine

1 Like

@ivanpribec, this re-binning routine is actually a bottleneck now in my code. It takes more time than sorting. I have a little benchmark here: GitHub - Nicholaswogan/futils

Build with cmake, then run ./tests/test_futils. I get

Time to rebin (s): 2.4625E-07

Yesterday I was able to optimize it a little bit. But not enough to make a ton of difference.

In practice, the function is for down-binning data defined on bins (i.e. piecewise functions). An example is spectra of the infrared radiation emitted by Earth to space. I can calculate the emission of infrared photons within many tiny wavelength bins (R = 1000 case in figure below), and then I can re-bin these results to a lower resolution (orange line is rebinning of R = 1000 case).

Wondering if you might know of any optimizations. Or if someone has implemented this routine in a more efficient way.

1 Like

I was just able to optimize rebin a little bit more, by re-ordering some conditional statements so that less comparisons were done. Now I get

Time to rebin (s): 2.3175E-07

Is the calculation possibly related to a moving average for smoothing (with a given window function) and so FFT could be used for convolution calculation?

1 Like

An alternative strategy is to:

  • Compute the cumulative counts <= each old bin boundary (like a cumulative distribution).
  • Interpolate from that to the new bin boundaries, to get a new cumulative distribution.
  • Numerically difference the latter to get the desired result.

I don’t know if it would be faster – it seems like each step would involve simple logic – but there would be some temporaries.

1 Like

@septc That is a really good thought. But I think in this case is possibly different than smoothing. I’ll think about it some more though.

@gareth That is an interesting idea. I’m going to try it!

I was able to speed up my implementation a bit more, but just reducing the number of conditionals: Time to rebin (s): 1.7800E-07

1 Like

You could try with just the do j = 1,n_old loop and keep a counter for the current new bin. Each old bin either fits entirely into the current new bin or there is a remainder to distributed over (perhaps several) following new bins. I think it would be sufficient to carry forward the density (value/width) and upper limit of the undistributed bin.

I don’t have time for a more detailed response right now. Perhaps later if you need more detail.

1 Like

How about this one?
I’m using temporary variables.

Time to rebin (s), Original: 9.5300E-07
Time to rebin (s), Temp Var: 7.5267E-07 ←

  subroutine rebin(old_bins, old_vals, new_bins, new_vals, ierr)
    real(dp), intent(in) :: old_bins(:)
    real(dp), intent(in) :: old_vals(:)
    real(dp), intent(in) :: new_bins(:)
    real(dp), intent(out) :: new_vals(:)
    integer, optional, intent(out) :: ierr
    
    integer :: i, j, l, n_old, n_new
    real(dp) :: b2, b1_inv, v_new
    
    real(dp) :: b_old0, b_old1
    real(dp) :: b_new0, b_new1
    real(dp) :: v_old

    n_old = size(old_vals)
    n_new = size(new_vals)
    
    ! option to check inputs.
    if (present(ierr)) then
      ierr = 0
      
      ! check shape
      if (n_old+1 /= size(old_bins)) then
        ierr = -1
        return
      endif
      
      if (n_new+1 /= size(new_bins)) then
        ierr = -2
        return
      endif
      
      ! check that new bins overlap with the old bins
      if (new_bins(1) < old_bins(1) .and. new_bins(2) < old_bins(1)) then
        ierr = -3
        return
      endif
      
      if (new_bins(n_new) > old_bins(n_old+1) .and. new_bins(n_new+1) > old_bins(n_old+1)) then
        ierr = -4
        return
      endif
      
      ! check that bins are all increasing
      do i = 1,n_old
        if (old_bins(i+1) <= old_bins(i)) then
          ierr = -5
          return
        endif
      enddo
      
      do i = 1,n_new
        if (new_bins(i+1) <= new_bins(i)) then
          ierr = -6
          return
        endif
      enddo
      
    endif
    
    new_vals = 0.0_dp
    l = 1
    
    do i = 1,n_new
      b_new0 = new_bins(i)
      b_new1 = new_bins(i+1)
            
      b1_inv = 1d0/(b_new1 - b_new0)
      v_new = new_vals(i)
      do j = l,n_old

        b_old0 = old_bins(j)
        b_old1 = old_bins(j+1)
        v_old = old_vals(j)
        ! Several different cases
        
        !    ____    (old bin)
        !  ________  (new bin)
        if (b_old0 > b_new0 .and. b_old1 < b_new1) then
          
          b2 = b_old1 - b_old0
          v_new = v_new + (b2*b1_inv)*v_old
        
        ! ______       (old bin)
        !     ________ (new bin)
        elseif (b_old0 <= b_new0 .and. &
            b_old1 > b_new0 .and. b_old1 < b_new1) then
          
          b2 = b_old1 - b_new0
          v_new = v_new + (b2*b1_inv)*v_old
          
        !        ____    (old bin)
        !  ________      (new bin)
        elseif (b_old0 >= b_new0 .and. &
                b_old0 < b_new1 .and. b_old1 > b_new1) then
        
          b2 = b_new1 - b_old0
          v_new = v_new + (b2*b1_inv)*v_old
          
          ! need to move to the next new_bin
          l = j
          exit
        
        ! ________    (old bin)
        !   ____      (new bin) 
        elseif (b_old0 <= b_new0 .and. b_old1 >= b_new1) then
          
          v_new = v_new + v_old
          
          ! need to move to the next new_bin
          l = j
          exit
            
        !  There are two remaining cases:
        !         _____   (old bin)
        !   ____          (new bin) 
        ! 
        ! and
        !  ____           (old bin)
        !        _____    (new bin) 
        ! 
        ! These cases can occur on either end of re-binning.
        ! However this is OK. The part of old_bin that does not
        ! overlap with new_bin on the ends, will just be ignored.
          
        endif
      enddo
      new_vals(i) = v_new
    enddo
    
  end subroutine
3 Likes

Welcome, @NaoMatch , and thanks for your machine learning project
GitHub - NaoMatch/FortLearner.

3 Likes

@NaoMatch Thanks so much! This is faster on my computer too.

Why though?? Why is using temporary variables faster? Why doesn’t the compiler make this optimization for me? I suppose it uses slightly more stack memory, which might not be wanted in some special circumstances.

This is the bottleneck.

You have a j-loop within an i-loop. For each pair of values new_bins(i) and new_bins(i+1), you are searching linearly through the (probably large) set of values of old_bins(j) until you find the bin that matches the specified condition. Your bins are probably in monotonically increasing order. If so, using binary search would change the complexity of the search procedure from O(N) to O(lg N), where N is the number of bins.

Programming tricks such as using temporary variables, pointers, etc., can help speed up a program, but they rarely suffice to compensate for a poor choice of algorithm (such as linear search instead of binary search).

The nested loop is not as bad as it looks at first glance. The number of times the stuff in the inner
loop is evaluated (all the conditionals) is n_old. This is because the inner loop is being broken many times.

A binary search does not help either. In all my use cases, i know in advance that no “searching” is necessary, and that the inner loop will be relevant on j == 1

I’m pretty sure we are reaching maximal speed, for the operation I need to do. There is a possibility to make an approximation that is faster as suggested by @septc .

I think we need to help the compiler as much as the compiler helps us.

To use temporary variables is to help the compiler optimize.

The compiler is very smart, but not always smart. I think we should use simple temporary variables (ie: v_new) because they do not require much human effort. (Access to arrays should be kept as low as possible.)

Of course the smartest way is to improve the algorithm.