There was an off-by-one error in the last version of the code that I posted. In the process of fixing that, I put in some counters to verify that my average statistics were valid, and I concluded that they aren’t. Basically, my assumptions about even distributions of modulo values for the various sk and sl partial sums were incorrect. So I decided to leave in the counters and remove the approximate statistics evaluations. The timings are unaffected, so that seems like a good tradeoff. I also did some other code refactoring, such as moving some of the internal functions and subroutines from the main program into the module. Here are the Apple M2 timings with gfortran for findmax=1
1: 133^5 + 110^5 + 84^5 + 27^5 = 144^5 = 61917364224
modulo test: kkntt= 30840 kkntf= 15357 ratio= 4.980E-1
modulo test: lkntt= 99991 lkntf= 11481 ratio= 1.148E-1
real 0m0.008s
user 0m0.005s
sys 0m0.002s
and here are the findmax=3 timings
1: 133^5 + 110^5 + 84^5 + 27^5 = 144^5 = 61917364224
2: 266^5 + 220^5 + 168^5 + 54^5 = 288^5 = 1981355655168
3: 399^5 + 330^5 + 252^5 + 81^5 = 432^5 = 15045919506432
modulo test: kkntt= 782569 kkntf= 401860 ratio= 5.135E-1
modulo test: lkntt= 8071752 lkntf= 952863 ratio= 1.180E-1
real 0m0.062s
user 0m0.060s
sys 0m0.002s
That kkntf ratio of 0.51 basically says that 49% of the k loop values are skipped due to the modulo tests, and the lkntf ratio of 0.11 says that about 89% of the l loop values are skipped. The next challenge is to find good modulo values that reduce the kkntf and lkntf values and their associated ratios. Here is the version of the code that produced those timings.
module allow_mod
use, intrinsic :: iso_fortran_env, only: ik => int64
implicit none
private
public :: ik, allow_t, fill_allow, print_allow, set_allow, set_i5modN, mod_test_k, mod_test_l
interface print_allow ! allow both scalar and array arguments.
module procedure :: print_allow, print_allow_1
end interface print_allow
integer, parameter :: lk = selected_logical_kind(1) ! smallest logical value.
type level_t
integer :: nval ! number of allowed values.
integer, allocatable :: val(:) ! (1:nval) allowed values.
integer, allocatable :: map(:) ! (0:N-1) map from mod(m,N) to contiguous index, ix.
end type level_t
type allow_t
integer :: N ! modulus value.
integer :: i5modNx ! mapped mod(i**5,N) values
type(level_t) :: i ! target i info.
type(level_t), allocatable :: l(:) ! (1:nval) loop level l info.
type(level_t), allocatable :: k(:) ! (1:nval) loop level k info.
end type allow_t
character(*), parameter :: cfmt = '(*(g0,1x))'
contains
subroutine fill_allow( N, allow )
! allocate and fill in the allow derived type based on the modulus N.
! the goal is to limit the k and l loop indices when possible. this is
! achieved by using modulo arithmetic to limit the possibile
! matches of j**5 + k**5 + l**5 + m**5 with the target i**5.
implicit none
integer, intent(in) :: N ! modulus value with range up to about 1000.
type(allow_t), intent(out) :: allow
integer :: i, k, i5x, lx, nval
integer :: i5mod(1:N), val(1:N) ! temp work arrays.
logical(lk) :: mask(0:N-1), kmask(0:N-1)
allow%N = N
allow%i5modNx = -1 ! this is filled in later.
! compute mask(:) for the possible mod(i**5,N) values.
mask(:) = .false.
do i = 1, N
k = mod(int(i,ik)**5,N)
i5mod(i) = k
mask(k) = .true.
enddo
allocate( allow%i%map(0:N-1) )
call set_val_map( mask, allow%i%map, val, nval )
allow%i%nval = nval
allow%i%val = val(1:nval)
! compute the kmask(:) array.
! the .true. entries are the only sk values that can be generated.
kmask(:) = .false.
do i5x = 1, allow%i%nval
call update_mask( N, allow%i%val(i5x), allow%i%val, kmask )
enddo
! allocate and fill in the level l and level k info.
allocate( allow%l(1:nval), allow%k(1:nval) )
do i5x = 1, allow%i%nval
! fill in the level l info for this i5x.
mask(:) = .false.
call update_mask( N, allow%i%val(i5x), allow%i%val, mask )
allocate( allow%l(i5x)%map(0:N-1) )
call set_val_map( mask, allow%l(i5x)%map, val, nval )
allow%l(i5x)%nval = nval
allow%l(i5x)%val = val(1:nval)
! fill in the level k info for this i5x.
mask(:) = .false.
do lx = 1, allow%l(i5x)%nval
call update_mask( N, allow%l(i5x)%val(lx), allow%i%val(1:allow%i%nval), mask )
enddo
mask = mask .and. kmask ! mask out any unreachable values.
allocate( allow%k(i5x)%map(0:N-1) )
call set_val_map( mask, allow%k(i5x)%map, val, nval )
allow%k(i5x)%nval = nval
allow%k(i5x)%val = val(1:nval)
enddo
end subroutine fill_allow
subroutine print_allow( nlist, allow )
! print the contents of allow with annotations.
implicit none
integer, intent(in) :: nlist ! output unit
type(allow_t), intent(in) :: allow
character(*), parameter :: fmtg = '(*(g0.3,1x))'
integer :: i5x
write(nlist,fmtg)
write(nlist,fmtg) 'Allowed index values for modulus N=', allow%N
write(nlist,fmtg)
write(nlist,fmtg) 'Loop level m, nval=', allow%i%nval, &
& 'nonzero ratio=', real(allow%i%nval)/allow%N
write(nlist,fmtg) 'Allowed smx values, val(:)=', allow%i%val
write(nlist,fmtg)
write(nlist,fmtg) 'Loop level l info:'
do i5x = 1, allow%i%nval
write(nlist,fmtg) 'For i5x=', allow%i%val(i5x), 'nval=', allow%l(i5x)%nval, &
& 'nonzero ratio=', real(allow%l(i5x)%nval)/allow%N
write(nlist,fmtg) 'Allowed slx values, val(:)=', allow%l(i5x)%val
enddo
write(nlist,fmtg)
write(nlist,fmtg) 'Loop level k info:'
do i5x = 1, allow%i%nval
write(nlist,fmtg) 'For i5x=', allow%i%val(i5x), 'nval=', allow%k(i5x)%nval, &
& 'nonzero ratio=', real(allow%k(i5x)%nval)/allow%N
write(nlist,fmtg) 'Allowed skx values, val(:)=', allow%k(i5x)%val
enddo
return
end subroutine print_allow
subroutine print_allow_1( nlist, allow )
! print the contents of the 1-d array allow(:) and overall summary info.
implicit none
integer, intent(in) :: nlist
type(allow_t), intent(in) :: allow(:)
integer :: i
do i = 1, size(allow)
call print_allow( nlist, allow(i) )
enddo
return
end subroutine print_allow_1
subroutine set_val_map( mask, map, val, nval )
! search mask(0:) for .true. entries, set map(0:) and save the indices in val(:).
! the arrays map(0:) and val(:) are inverse mappings.
implicit none
logical(lk), intent(in) :: mask(0:) ! (0:N-1).
integer, intent(out) :: map(0:) ! (0:N-1)
integer, intent(out) :: val(:) ! allow(1:nval) are the allowed values.
integer, intent(out) :: nval ! number of allowed values.
integer :: i
map(:) = 0
nval = 0
do i = 0, ubound(mask,dim=1)
if ( mask(i) ) then
nval = nval + 1
val(nval) = i
map(i) = nval
endif
enddo
return
end subroutine set_val_map
subroutine update_mask( N, m, val, mask )
! update mask(:) using m and val(:).
! this is the modulo N ring addition of m with the additive inverse of the elements of val(:).
! mask(k)==.true. if k is in the output set.
implicit none
integer, intent(in) :: N ! the current modulus.
integer, intent(in) :: m ! the target value.
integer, intent(in) :: val(:) ! set of input values.
logical(lk), intent(inout) :: mask(0:) ! updated ring value indexes.
integer :: i
do i = 1, size(val)
mask(modulo( m - val(i), N )) = .true.
enddo
return
end subroutine update_mask
subroutine set_allow( modvals, allow )
! allocate and fill the allow derived type array.
implicit none
integer, intent(in) :: modvals(:)
type(allow_t), intent(out), allocatable :: allow(:)
integer :: i
allocate( allow(size(modvals)) )
do i = 1, size(modvals)
call fill_allow( modvals(i), allow(i) )
enddo
return
end subroutine set_allow
subroutine set_i5modN( i5, allow )
! set all of the mapped i5modN values.
implicit none
integer(ik), intent(in) :: i5
type(allow_t), intent(inout) :: allow(:)
integer :: i, N
do i = 1, size(allow)
N = allow(i)%N
allow(i)%i5modNx = allow(i)%i%map( mod(i5,int(N,ik)) )
enddo
return
end subroutine set_i5modN
pure logical function mod_test_k( sk, allow )
! perform modulo tests on the k loop index.
implicit none
integer(ik), intent(in) :: sk
type(allow_t), intent(in) :: allow(:)
integer :: i, i5modNx, N
! loop in order.
mod_test_k = .false.
do i = 1, size(allow)
N = allow(i)%N
i5modNx = allow(i)%i5modNx
mod_test_k = allow(i)%k(i5modNx)%map(int(mod(sk,int(N,ik)))) .eq. 0
if ( mod_test_k ) exit ! exit ASAP.
enddo
return
end function mod_test_k
pure logical function mod_test_l( sl, allow )
! perform modulo tests on the l loop index.
implicit none
integer(ik), intent(in) :: sl
type(allow_t), intent(in) :: allow(:)
integer :: i, i5modNx, N
! loop in order.
mod_test_l = .false.
do i = 1, size(allow)
N = allow(i)%N
i5modNx = allow(i)%i5modNx
mod_test_l = allow(i)%l(i5modNx)%map(int(mod(sl,int(N,ik)))) .eq. 0
if ( mod_test_l ) exit ! exit ASAP.
enddo
return
end function mod_test_l
end module allow_mod
program euler
! 133^5 + 110^5 + 84^5 + 27^5 = 144^5 = 61917364224
use, intrinsic :: iso_fortran_env, only: nlist => output_unit
use allow_mod, only: ik, allow_t, fill_allow, print_allow, set_allow, set_i5modN, mod_test_k, mod_test_l
implicit none
integer, parameter :: imax = 6208 ! integer range to search i=1:imax.
integer, parameter :: findmax = 3 ! max number of solutions to find.
logical, parameter :: printallow = .false. ! print the allow derived type info.
integer :: i,j, jmin, k, kmin, l, lmin, m, mmin, nfound
integer(ik) :: i5, sj, sk, sl, sm, diff
integer(ik) :: kkntf, kkntt, lkntf, lkntt ! modulo test stats.
integer(ik) :: n5(imax)
character(*), parameter :: cfmt = '(5(g0,1x),es0.3)'
character(*), parameter :: cfmtf = '(i0,": ",3(i0,"^5 + "),i0,"^5 = ",i0,"^5 = ",i0)'
type(allow_t), allocatable :: allow(:)
call set_allow( [775,275], allow ) ! some good values are 11, 31, 100, 275, 775.
if ( printallow) call print_allow( nlist, allow )
lkntf = 0; lkntt=0; kkntf=0; kkntt=0 ! loop count stats.
nfound = 0
outer: do i = 1, imax
i5 = int(i,ik)**5
n5(i) = i5
call set_i5modN( i5, allow )
jmin = index_min( i5/4 )
do j = jmin, i
sj = n5(j)
kmin = index_min( (i5-sj)/3 )
do k = kmin, j
sk = sj + n5(k)
diff = i5 - sk
if ( diff <= 0 ) exit
kkntt = kkntt + 1 ! before mod tests.
if ( mod_test_k( sk, allow ) ) cycle
kkntf = kkntf + 1 ! after mod tests.
lmin = index_min( diff/2 )
do l = lmin, k
sl = sk + n5(l)
diff = i5 - sl ! for a solution m**5 == diff is required.
if ( diff <= 0 ) exit
if ( n5(l) < diff ) cycle ! no solution in m=1:l.
lkntt = lkntt + 1 ! before mod tests
if ( mod_test_l( sl, allow ) ) cycle
lkntf = lkntf + 1 ! after mod tests
mmin = index_min( diff )
mloop: do m = mmin, l
sm = sl + n5(m)
diff = i5 - sm
select case (diff)
case (:-1) ! i5 < sm
exit mloop
case (0) ! i5 == sm
nfound = nfound + 1
write(nlist,cfmtf) nfound, j, k, l, m, i, i5
if ( nfound >= findmax ) then
exit outer
else
exit mloop ! exit m loop and continue searching.
endif
case default !(1:) sm < i5
cycle mloop
end select
enddo mloop
enddo
enddo
enddo
enddo outer
write(nlist,cfmt) 'modulo test: kkntt=', kkntt, 'kkntf=', kkntf, 'ratio=', real(kkntf)/kkntt
write(nlist,cfmt) 'modulo test: lkntt=', lkntt, 'lkntf=', lkntf, 'ratio=', real(lkntf)/lkntt
contains
pure function index_min( v ) result( r )
! return a value r such that r**5 <= v.
! it need not be the largest value, but the closer to v the better.
implicit none
integer :: r
integer(ik), intent(in) :: v
r = int( real(v)**0.2 )
if ( int(r,ik)**5 > v ) r = r - 1 ! possible floating point error.
r = max( 1, r ) ! enforce lower bound of 1.
return
end function index_min
end program euler
It is also possible now with this version of the code to search the full space i=1:6208 for solutions. That is the largest value for which i**5 does not overflow with INT64 arithmetic. Set findmax to a value larger than 43 to search the full space. The earlier versions of the code would have taken many hours to do that. Here is the timing output for this version.
1: 133^5 + 110^5 + 84^5 + 27^5 = 144^5 = 61917364224
2: 266^5 + 220^5 + 168^5 + 54^5 = 288^5 = 1981355655168
3: 399^5 + 330^5 + 252^5 + 81^5 = 432^5 = 15045919506432
4: 532^5 + 440^5 + 336^5 + 108^5 = 576^5 = 63403380965376
5: 665^5 + 550^5 + 420^5 + 135^5 = 720^5 = 193491763200000
6: 798^5 + 660^5 + 504^5 + 162^5 = 864^5 = 481469424205824
7: 931^5 + 770^5 + 588^5 + 189^5 = 1008^5 = 1040645140512768
8: 1064^5 + 880^5 + 672^5 + 216^5 = 1152^5 = 2028908190892032
9: 1197^5 + 990^5 + 756^5 + 243^5 = 1296^5 = 3656158440062976
10: 1330^5 + 1100^5 + 840^5 + 270^5 = 1440^5 = 6191736422400000
11: 1463^5 + 1210^5 + 924^5 + 297^5 = 1584^5 = 9971853425639424
12: 1596^5 + 1320^5 + 1008^5 + 324^5 = 1728^5 = 15407021574586368
13: 1729^5 + 1430^5 + 1092^5 + 351^5 = 1872^5 = 22989483914821632
14: 1862^5 + 1540^5 + 1176^5 + 378^5 = 2016^5 = 33300644496408576
15: 1995^5 + 1650^5 + 1260^5 + 405^5 = 2160^5 = 47018498457600000
16: 2128^5 + 1760^5 + 1344^5 + 432^5 = 2304^5 = 64925062108545024
17: 2261^5 + 1870^5 + 1428^5 + 459^5 = 2448^5 = 87913803014995968
18: 2394^5 + 1980^5 + 1512^5 + 486^5 = 2592^5 = 116997070082015232
19: 2527^5 + 2090^5 + 1596^5 + 513^5 = 2736^5 = 153313523637682176
20: 2660^5 + 2200^5 + 1680^5 + 540^5 = 2880^5 = 198135565516800000
21: 2793^5 + 2310^5 + 1764^5 + 567^5 = 3024^5 = 252876769144602624
22: 2926^5 + 2420^5 + 1848^5 + 594^5 = 3168^5 = 319099309620461568
23: 3059^5 + 2530^5 + 1932^5 + 621^5 = 3312^5 = 398521393801592832
24: 3192^5 + 2640^5 + 2016^5 + 648^5 = 3456^5 = 493024690386763776
25: 3325^5 + 2750^5 + 2100^5 + 675^5 = 3600^5 = 604661760000000000
26: 3458^5 + 2860^5 + 2184^5 + 702^5 = 3744^5 = 735663485274292224
27: 3591^5 + 2970^5 + 2268^5 + 729^5 = 3888^5 = 888446500935303168
28: 3724^5 + 3080^5 + 2352^5 + 756^5 = 4032^5 = 1065620623885074432
29: 3857^5 + 3190^5 + 2436^5 + 783^5 = 4176^5 = 1269996283285733376
30: 3990^5 + 3300^5 + 2520^5 + 810^5 = 4320^5 = 1504591950643200000
31: 4123^5 + 3410^5 + 2604^5 + 837^5 = 4464^5 = 1772641569890893824
32: 4256^5 + 3520^5 + 2688^5 + 864^5 = 4608^5 = 2077601987473440768
33: 4389^5 + 3630^5 + 2772^5 + 891^5 = 4752^5 = 2423160382430380032
34: 4522^5 + 3740^5 + 2856^5 + 918^5 = 4896^5 = 2813241696479870976
35: 4655^5 + 3850^5 + 2940^5 + 945^5 = 5040^5 = 3252016064102400000
36: 4788^5 + 3960^5 + 3024^5 + 972^5 = 5184^5 = 3743906242624487424
37: 4921^5 + 4070^5 + 3108^5 + 999^5 = 5328^5 = 4293595042302394368
38: 5054^5 + 4180^5 + 3192^5 + 1026^5 = 5472^5 = 4906032756405829632
39: 5187^5 + 4290^5 + 3276^5 + 1053^5 = 5616^5 = 5586444591301656576
40: 5320^5 + 4400^5 + 3360^5 + 1080^5 = 5760^5 = 6340338096537600000
41: 5453^5 + 4510^5 + 3444^5 + 1107^5 = 5904^5 = 7173510594925953024
42: 5586^5 + 4620^5 + 3528^5 + 1134^5 = 6048^5 = 8092056612627283968
43: 5719^5 + 4730^5 + 3612^5 + 1161^5 = 6192^5 = 9102375309234143232
modulo test: kkntt= 2251762772 kkntf= 1168337215 ratio= 5.189E-1
modulo test: lkntt= 341425893798 lkntf= 40830638283 ratio= 1.196E-1
real 22m26.829s
user 22m21.480s
sys 0m5.278s
EDIT: I have written a draft of a short paper that summarizes the contents of this discussion thread and contains three version of the fortran code, a straightforward version and two optimized versions. My intention is to post the paper and the associated fortran codes in the Tutorial section here in the discourse.
There is also the possibility of posting this to the RosettaCode site at some time in the future. The fortran codes that are already there are, let’s say, problematic.
My first goal is to ensure that everyone is acknowledged appropriately for their contributions to this thread. I think the best way would be to just include everyone in the author list of the paper. My intention is for this to be as inclusive as possible. Here is my current list:
@mecej4, @themos, @ivanpribec, @rwmsu,
@MarDie, @JohnCampbell, @wspector, @certik,
@ivanpribec,
If anyone else wants their name added, just notify me here in this thread or with a DMS. If someone wants their name removed, that will also be done, of course. There is also the possibility of adding names to an acknowledgement within the paper rather than as an author.
After I get the correct list of names, I will post the paper, which is in PDF format written originally in latex. I don’t think there is an easy way to edit a shared document here in this discourse group, so I am hoping that editing of the paper itself can be done just with DMS. We’ll see how that goes, and if other options, such as Overleaf, are better, then we can do that instead.