 # Parallel sum function

I have a subroutine as below:

``````subroutine lm(n, x, y, res)
integer(8) :: n
real(8) :: x(n), y(n), res
real(8) :: sum_x, sum_y, sum_xx, sum_xy
integer(8) :: i

sum_x = 0d0
sum_y = 0d0
sum_xx = 0d0
sum_xy = 0d0

!\$omp parallel do reduction(+:sum_x)
do i = 1, n
sum_x = sum_x + x(i)
end do
!\$omp end parallel do

!\$omp parallel do reduction(+:sum_y)
do i = 1, n
sum_y = sum_y + y(i)
end do
!\$omp end parallel do

!\$omp parallel do reduction(+:sum_xx)
do i = 1, n
sum_xx = sum_xx + x(i)*x(i)
end do
!\$omp end parallel do

!\$omp parallel do reduction(+:sum_xy)
do i = 1, n
sum_xy = sum_xy + x(i)*y(i)
end do
!\$omp end parallel do

res = (n*sum_xy - sum_x*sum_y)/(n*sum_xx - sum_x*sum_x)
end subroutine
``````

If a parallel sum function can be used, the subroutine will be much less verbose. Is there a plan of adding a parallel sum function to the language or std? C++ is already doing this, see:
https://en.cppreference.com/w/cpp/experimental/reduce

1 Like

There is already such a function in the standard:
https://gcc.gnu.org/onlinedocs/gfortran/CO_005fSUM.html

Using images and `co_sum()` you can do the same thing than with OpenMP. I have starded playing with that:
https://github.com/vmagnin/exploring_coarrays/blob/main/pi_monte_carlo_co_sum.f90

1 Like
``````res = (n*sum(x*y) - sum(x)*sum(y))/(n*sum(x**2) - sum(x)**2)
``````

Edit: If you still want to do it the verbose way, you might consider merging the loops.

``````  !\$omp parallel do reduction(+:sum_x,sum_y,sum_xx,sum_xy)
do i = 1, n
sum_x = sum_x + x(i)
sum_y = sum_y + y(i)
sum_xx = sum_xx + x(i)**2
sum_xy = sum_xy + x(i)*y(i)
end do
!\$omp end parallel do
``````

I expect this would also reduce the overhead of launching the thread group. Before adding parallel instructions, it’s a good idea to focus on single-core performance. Then you have a better picture if you’re actually profiting from the OMP pragmas or just wasting additional CPU cycles.

Edit 2: a MWE can be found in the folded region.

Click to open
``````module lm_mod

! These are the types used in the dotCall64 R package
use, intrinsic :: iso_c_binding, only: &
dp => c_double, &
int64 => c_int64_t

implicit none

contains

subroutine lm(n, x, y, res)
integer(int64), intent(in) :: n
real(dp), intent(in) :: x(n), y(n)
real(dp), intent(out) :: res
real(dp) :: sum_x, sum_y, sum_xx, sum_xy
integer(int64) :: i

#if PARALLEL
sum_x = 0
sum_y = 0
sum_xx = 0
sum_xy = 0
!\$omp parallel do reduction(+:sum_x,sum_y,sum_xx,sum_xy) shared(x,y)
do i = 1, n
sum_x = sum_x + x(i)
sum_y = sum_y + y(i)
sum_xx = sum_xx + x(i)**2
sum_xy = sum_xy + x(i)*y(i)
end do
!\$omp end parallel do
#else
sum_x = sum(x)
sum_y = sum(y)
sum_xx = sum(x**2)
sum_xy = sum(x*y)
#endif
res = (n*sum_xy - sum_x*sum_y)/(n*sum_xx - sum_x*sum_x)

end subroutine

end module

program test_lm

use lm_mod
implicit none

real(dp), allocatable :: x(:), y(:)
real(dp) :: dx, res

character(len=8) :: argv
integer(int64) :: n, i
integer(int64) :: rate, t1, t2

call system_clock(count_rate=rate)

call get_command_argument(1,argv)

allocate(x(n),y(n))

dx = 1.0_dp/real(n-1,dp)
x = [((i-1)*dx,i=1,n)]

call random_number(y)

y = 0.5_dp*x + 0.05_dp*y

call system_clock(t1)
call lm(n,x,y,res)
write(*,*) "res = ", res
call system_clock(t2)

write(*,*) "Time(s) = ", real(t2-t1,dp)/real(rate,dp)

end program
``````
3 Likes

Can co_sum be used in shared libs?

See that very recent post:

Note that the term Coarrays can be used to talk about coarrays themselves, but is also often used as a more generic term to talk about all the parallel features introduced by Fortran 2008/2018, like the collective subroutines.

1 Like