I have some good news and some bad news. First the good news. I removed the exclude_* functions in my code and replaced them with a simple array lookup to determine when the inner loops can be skipped. This helps the times a little bit.
i^5 = 61917364224
133 110 84 27 144
real 0m0.025s
user 0m0.022s
sys 0m0.002s
These are the Apple M2 times, and they are about 10% faster than the previous version with the function calls which had a select case block and some logical expressions. I folded all of those expressions into a table, and now I just do a table lookup to do the branches.
Those branch exits are based on @mecej4’s idea of using modulo 11 comparisons. For reasons that I don’t fully understand, the residuals mod(i**5,p) are sparse for some prime numbers p but not for others. For p=11, you can start with mod(i**10,11). Fermat’s little theorem says that is equal to 0 if i is a multiple of 11, and it is equal to 1 otherwise. That then gives mod(i**5,11)**2=1 and mod(i**5,11)=±1. So out of the 11 possible values for the residual, only the values {-1,0,1} occur. The -1 value is congruent to 10, so the values that occur in the fortran code are {0,1,10}.
I looked at some other prime numbers p, like 13, 17, 23, and so on. None of those are sparse, so they can’t be used to screen out the inner loops. But p=31 is sparse, with residuals {0,1,5,6,25,26,30}. I think that is because mod(i**30,31)==1, and that is the same as mod(i**5)**6, so there are six other modulo roots that come into play. That’s 7 nonzero values out of 31 possible. I thought if screening for mod 11 residuals worked so well, why not try mod 31 residuals. So I added those to the code, checked to make sure that they do indeed result in additional inner loop exclusions, and did the timings. What I see is that this extra test slows down the code.
i^5 = 61917364224
133 110 84 27 144
real 0m0.027s
user 0m0.025s
sys 0m0.002s
This simple array lookup and conditional branch somehow adds 10% to the run times, basically undoing the small improvement I made by doing the table lookup. I do not understand why that happens, but those are the numbers. Here is the binary search version of the code with the mod 31 test. You can comment out that mod 31 test statement to see if it has the same effect on your hardware+compiler combinations.
program sum_conjecture
! sum_conjecture.f90
!
! 27^5 + 84^5 + 110^5 + 133^5 = 144^5
!
use, intrinsic :: iso_fortran_env, only: ik => int64, logical_kinds
implicit none
integer, parameter :: nmax = 6208 ! mmax integer range to search.
integer, parameter :: findmax = 1 ! max number of solutions to find.
integer, parameter :: lk = selected_logical_kind(1) ! f2023 and later.
!integer, parameter :: lk = 1 ! logical kind with the smallest storage_size() value.
integer(ik) :: i5, sk, sl, sm, diff
integer(ik) :: n5(nmax)
integer :: i, j, k, l, m, mapT11, mapT31, nbits, mmin, mmax, nfound
logical(lk), parameter :: t=.true., f=.false.
integer, parameter :: map_11(0:10) = [1,2,0,0,0,0,0,0,0,0,3] ! map T11={0,1,10} to a contiguous domain.
logical(lk), parameter :: exclude_k11(0:10,1:3) = reshape([ &
& f,f,f,t,t,t,t,t,t,f,f, & ! T11 = 0
& f,f,f,t,t,t,t,t,t,t,f, & ! T11 = 1
& f,f,t,t,t,t,t,t,t,f,f],& ! T11 = 10
& shape(exclude_k11) )
logical(lk), parameter :: exclude_l11(0:10,1:3) = reshape([ &
& f,f,t,t,t,t,t,t,t,t,f, & ! T11 = 0
& f,f,f,t,t,t,t,t,t,t,t, & ! T11 = 1
& f,t,t,t,t,t,t,t,t,f,f],& ! T11 = 10
& shape(exclude_l11) )
integer, parameter :: map_31(0:30) = &
& [1,2,0,0,0,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,6,0,0,0,7] ! map T11={0,1,5,6,25,26,30} to a contiguous domain.
logical(lk), parameter :: exclude_l31(0:30,1:7) = reshape([ &
& f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,f, & ! T31 = 0
& f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t, & ! T31 = 1
& f,t,t,t,f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f, & ! T31 = 5
& f,f,t,t,t,f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t, & ! T31 = 6
& f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,t,f,f,f,t,t,t,f, & ! T31 = 25
& f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,t,f,f,f,t,t,t, & ! T31 = 26
& f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,f,f],& ! T31 = 30
& shape(exclude_l31) )
character(*), parameter :: cfmt = '(*(g0,1x))'
intrinsic :: int, mod, bit_size, leadz, ibset
nfound = 0
outer: do i = 1, nmax
i5 = int(i,ik)**5
n5(i) = i5
mapT11 = map_11(mod(i5,11_ik)) ! target residual; T11 = 0, 1, or 10.
mapT31 = map_31(mod(i5,31_ik)) ! target residual; T31 = 0, 1, 5, 6, 25, 26, or 30.
do j = 1, i
do k = 1, j
sk = n5(j) + n5(k)
if ( sk > i5 ) exit
if ( exclude_k11( mod(sk,11_ik), mapT11 ) ) cycle
do l = 1, k
sl = sk + n5(l)
diff = i5 - sl ! target m**5 value.
if ( diff < 0 ) exit
if ( exclude_l11( mod(sl,11_ik), mapT11 ) ) cycle ! <--- comment this out.
if ( exclude_l31( mod(sl,31_ik), mapT31 ) ) cycle ! <--- comment this out.
if ( n5(l) < diff ) cycle ! no possible solution in (1...l).
! set up for binary search for m**5 == diff.
nbits = bit_size(diff) - leadz(diff)
mmin = ibset(0,((nbits-1)/5)) ! lower bound to m.
mmax = min( l, ibset(0,((nbits+4)/5))) ! upper bound to m.
!mmin = 1; mmax = l
if ( n5(mmin) == diff ) then ! check for shortcuts.
mmax = mmin
elseif ( n5(mmax) == diff ) then
mmin = mmax
endif
do
m = (mmin + mmax) / 2
if ( n5(m) > diff ) then
mmax = m
elseif ( n5(m) < diff ) then
if ( m == mmin ) exit ! no solution.
mmin = m
else ! solution found.
print cfmt, "i^5 = ", i5
print cfmt, j, k, l, m, i
nfound = nfound + 1
if ( nfound >= findmax ) then
exit outer
else
exit ! continue searching.
endif
endif
enddo
enddo
enddo
enddo
enddo outer
end program sum_conjecture
Any ideas why that slowdown occurs would be greatly appreciated. It doesn’t make sense to me.
Just to see what happens, I commented out both of the l loop mod checks, and here are the timings.
i^5 = 61917364224
133 110 84 27 144
real 0m0.016s
user 0m0.014s
sys 0m0.002s
Those are the best times I’ve seen so far, and I don’t understand why. I went further and commented out also the k loop mod tests, and the code slows down. At least that makes sense. But I don’t understand why branching around the inner loop (which is the binary search in this version) slows things down. Anyway, that is the bad news.
The next thing to consider is using the mod 31 values in a k loop test. I’m not sure if that is worth the effort because almost all of the generated values are allowed. at that level.
I also added the line
integer, parameter :: findmax = 1 ! max number of solutions to find.
to this version to allow searching for multiple solutions. You can set it to a small value, like 5 or 10, to see how the code works for longer searches. Don’t set it too high though or you will have to wait overnight.