Euler Conjecture and CDC 6600

We could even do away with the innermost loop by using FINDLOC

!
   ! 27^5 + 84^5 + 110^5 + 133^5 = 144^5
  program euler
   use, intrinsic :: iso_fortran_env, only: int64
   implicit none
   integer(int64) :: i,j, k, l, m,n5(145)
   character(*), parameter :: cfmt = '(*(g0,1x))'
   outer: do i = 1, 145
      n5(i) = i**5
      do j = 1, i; do k = 1, j; do l = 1, k
         m=findloc(n5(j)+n5(k)+n5(l)+n5(1:l),value=n5(i),dim=1)
         if ( m > 0 ) exit outer
         end do; end do; end do
   end do outer
     print cfmt, "i^5 = ", n5(i)
     print cfmt, j, k, l, m, i
  end program euler
1 Like

The findloc() intrinsic was also suggested in some earlier posts. The problem is that findloc() must search the entire list when the search value is not found. In this situation, the list is ordered, so it is possible to exit early once the search value is exceeded.

However this looks like the shortest program in terms of lines of code. :slight_smile:

This was a good idea, and I was wrong about it only being applicable to the innermost loop. Here is how I was wrong.

Given an i**5 value, for a given j the largest value that can be generated is when k==j, l==j, m==j. This gives the largest value as 4*j**5. So

jmin = max ( int( real(i5 / 4)**0.2 ), 1 )

does give a useful lower bound for the j loop. Then going further,

kmin = max ( int( real(i5-sj / 3)**0.2 ), 1 )

gives a useful lower bound for the k loop. Then going further

lmin = max ( int( real(i5-sk / 2)**0.2 ), 1 )

gives a lower bound for the l loop.

Applying all these together with my latest posted code (with all the mod tests) then gives

$ gfortran -O3 sum_conjecture3.F90
$ time a.out
i5 = 61917364224
133 110 84 27 144

real    0m0.019s
user    0m0.005s
sys     0m0.003s

These are Apple M1 times. These are the first timings that are consistently below 10 ms. I’ll try these tests on some of the other versions to see how well they work.

Ok, I took out all of the mod tests, and just left the index range tests (including jmin, kmin, lmin, and mmin), and here are those times (Apple M2, gfortran -O3).

i^5 =  61917364224
133 110 84 27 144

real    0m0.010s
user    0m0.008s
sys     0m0.002s

Here is that code.

program euler
   ! 27^5 + 84^5 + 110^5 + 133^5 = 144^5
   use, intrinsic :: iso_fortran_env, only: ik => int64
   implicit none
   integer, parameter :: nmax = 6208
   integer, parameter :: findmax = 1
   integer :: i,j, jmin, k, kmin, l, lmin, m, mmin, nfound
   integer(ik) :: i5, sj, sk, sl, sm, diff
   integer(ik) :: n5(nmax)
   character(*), parameter :: cfmt = '(*(g0,1x))'

   nfound = 0
   outer: do i = 1, nmax
      i5 = int(i,ik)**5
      n5(i) = i5
      jmin = index_min( i5/4 )
      do j = jmin, i
         sj = n5(j)
         kmin = index_min( (i5-sj)/3 )
         do k = kmin, j
            sk = sj + n5(k)
            diff = i5 - sk
            if ( diff < 0 ) exit
            lmin = index_min( diff/2 )
            do l = lmin, k
               sl = sk + n5(l)
               diff = i5 - sl
               if ( diff < 0 ) exit
               if ( (sl + n5(l)) < i5 ) cycle   ! no solution in m=1:l.

               mmin = index_min( diff )
               do m = mmin, l
                  sm = sl + n5(m)
                  diff = i5 - sm
                  if ( diff > 0 ) then
                     cycle
                  elseif ( diff < 0 ) then
                     exit
                  else   ! sm == i5
                     nfound = nfound + 1
                     print cfmt, "i^5 = ", i5
                     print cfmt, j, k, l, m, i
                     if ( nfound >= findmax ) then
                        exit outer
                     else
                        exit
                     endif
                  endif
                  
               enddo
            enddo
         enddo
      enddo
   enddo outer
contains

   pure function index_min( v ) result(r)
      ! return a value r such that r**5 <= v.
      ! it need not be the largest value, but the closer to v the better.
      implicit none
      integer                 :: r
      integer(ik), intent(in) :: v

      r = max( 1, int( real(v)**0.2 ) )

      return
   end function index_min
      
end program euler

Even these times are now below 10 ms user time, although the elapsed times are usually a little over. If I set findmax to 3, here are the timings.

i^5 =  61917364224
133 110 84 27 144
i^5 =  1981355655168
266 220 168 54 288
i^5 =  15045919506432
399 330 252 81 432

real    0m0.117s
user    0m0.115s
sys     0m0.002s

I will probably switch timing results to findmax=3 in the future because the findmax=1 values are so small now.

The next thing to try is to put back some of the mod tests to see how effective they really are now.

EDIT: Here is an update on the saga of optimizing the Euler search algorithm. I managed to generalize the modulo testing for the l and k loops, so now I have a general version that allows an arbitrary number of modulo tests. I combined that with the index_min() function I posted before, and this is the fastest version I have so far. The findmax=1 times (Apple M2) are

$ gfortran -O3 --warn-all euler10.F90
$ time a.out
i^5 =  61917364224
133 110 84 27 144

real    0m0.008s
user    0m0.005s
sys     0m0.002s

These times are now pretty consistently in the single-digit millisecond range. The findmax=3 times are

i^5 =  61917364224
133 110 84 27 144
i^5 =  1981355655168
266 220 168 54 288
i^5 =  15045919506432
399 330 252 81 432

real    0m0.067s
user    0m0.064s
sys     0m0.002s

These timings are almost 2x faster than the previous version. Here is this version of the code.

module allow_mod

   implicit none

   private
   public :: allow_t, fill_allow, print_allow

   integer, parameter :: lk = selected_logical_kind(1)  ! smallest logical value.
   
   type level_t
      integer :: nval                      ! number of allowed values.
      integer, allocatable :: val(:)       ! (1:nval) allowed values.
      integer, allocatable :: map(:)       ! (0:N-1)  map from mod(m,N) to contiguous index, ix.
   end type level_t

   type allow_t
      integer                    :: N             ! modulus value.
      integer                    :: i5modNx = -1  ! mapped mod(i**5,N) values
      type(level_t)              :: i             ! target i info.
      type(level_t), allocatable :: l(:)          ! (1:ix)
      type(level_t), allocatable :: k(:)          ! (1:ix)
   end type allow_t

   character(*), parameter :: cfmt = '(*(g0,1x))'

contains

   subroutine fill_allow( N, allow )

      ! allocate and fill in the allow derived type based on the modulus N.

      ! the goal is to limit the k and l loop indices when possible. this is
      ! achieved by using modulo arithmetic to limit the possibilities of
      ! matching j**5 + k**5 + l**5 + m**5 with the target i**5.

      implicit none
      integer, intent(in)          :: N      ! modulus value.
      type(allow_t), intent(inout) :: allow

      integer :: i, k, i5x, lx, nval
      integer :: i5mod(1:N), val(1:N)          ! temp work arrays.
      logical(lk) :: mask(0:N-1), kmask(0:N-1)

      allow%N = N

      ! compute mask(:) for the possible mod(i**5,N) values.
      
      mask(:) = .false.
      do i = 1, N
         k = mod(i*i,N)    ! i**2
         k = mod(k*k,N)    ! i**4
         k = mod(i*k,N)    ! i**5
         i5mod(i) = k
         mask(k)  = .true.
      enddo

      allocate( allow%i%map(0:N-1) )
      call set_val_map( mask, allow%i%map, val, nval )
      allow%i%nval = nval
      allow%i%val = val(1:nval)

      ! compute the kmask(:) array.
      ! the .true. entries are the only sk values that can be generated.

      kmask(:) = .false.
      do i5x = 1, allow%i%nval
         call update_mask( N, allow%i%val(i5x), allow%i%val, kmask )
      enddo

      ! allocate and fill in the level l and level k info.

      allocate( allow%l(1:nval), allow%k(1:nval) )

      do i5x = 1, allow%i%nval
         ! fill in the level l info for this i5x.
         mask(:) = .false.
         call update_mask( N, allow%i%val(i5x), allow%i%val, mask )
         allocate( allow%l(i5x)%map(0:N-1) )
         call set_val_map( mask, allow%l(i5x)%map, val, nval )
         allow%l(i5x)%nval = nval
         allow%l(i5x)%val = val(1:nval)

         ! fill in the level k info for this i5x.
         mask(:) = .false.
         do lx = 1, allow%l(i5x)%nval
            call update_mask( N, allow%l(i5x)%val(lx), allow%i%val(1:allow%i%nval), mask )
         enddo
         mask = mask .and. kmask  ! mask out any unreachable values.
         allocate( allow%k(i5x)%map(0:N-1) )
         call set_val_map( mask, allow%k(i5x)%map, val, nval )
         allow%k(i5x)%nval = nval
         allow%k(i5x)%val  = val(1:nval)
      enddo

   end subroutine fill_allow

   subroutine print_allow( nlist, allow )
      ! print the contents of allow with annotations.
      implicit none
      integer, intent(in)       :: nlist  ! output unit
      type(allow_t), intent(in) :: allow

      character(*), parameter :: fmtg = '(*(g0.3,1x))'

      integer :: i5x

      write(nlist,fmtg)
      write(nlist,fmtg) 'Allowed index values for modulus N=', allow%N

      write(nlist,fmtg)
      write(nlist,fmtg) 'Loop level m, nval=', allow%i%nval, &
           & 'nonzero ratio=', real(allow%i%nval)/allow%N
      write(nlist,fmtg) 'Allowed sm values, val(:)=', allow%i%val

      write(nlist,fmtg)
      write(nlist,fmtg) 'Loop level l info:'
      do i5x = 1, allow%i%nval
         write(nlist,fmtg) 'For i5x=', allow%i%val(i5x), 'nval=', allow%l(i5x)%nval, &
              & 'nonzero ratio=', real(allow%l(i5x)%nval)/allow%N
         write(nlist,fmtg) 'Allowed sl values, val(:)=', allow%l(i5x)%val
      enddo

      write(nlist,fmtg)
      write(nlist,fmtg) 'Loop level k info:'
      do i5x = 1, allow%i%nval
         write(nlist,fmtg) 'For i5x=', allow%i%val(i5x), 'nval=', allow%k(i5x)%nval, &
              & 'nonzero ratio=', real(allow%k(i5x)%nval)/allow%N
         write(nlist,fmtg) 'Allowed sk values, val(:)=', allow%k(i5x)%val
      enddo

      return
   end subroutine print_allow

   subroutine set_val_map( mask, map, val, nval )
      ! search mask(0:) for .true. entries, set map(0:) and save the indices in val(:).
      ! the arrays map(0:) and val(:) are inverse mappings.
      implicit none
      logical(lk), intent(in)  :: mask(0:) ! (0:N-1).
      integer, intent(out)     :: map(0:)  ! (0:N-1)
      integer, intent(out)     :: val(:)   ! allow(1:nval) are the allowed values.
      integer, intent(out)     :: nval     ! number of allowed values.

      integer :: i

      map(:) = 0
      nval   = 0
      do i = 0, ubound(mask,dim=1)
         if ( mask(i) ) then
            nval      = nval + 1
            val(nval) = i
            map(i)    = nval
         endif
      enddo

      return
   end subroutine set_val_map

   subroutine update_mask( N, m, allow, mask )
      ! update mask(:) using m and allow(:).
      implicit none
      integer, intent(in)        :: N          ! the current modulus.
      integer, intent(in)        :: m          ! the target value.
      integer, intent(in)        :: allow(:)   ! input allowed values.
      logical(lk), intent(inout) :: mask(0:)   ! updated allowed values.

      integer :: i, k

      do i = 1, size(allow)
         k = modulo( m - allow(i), N )
         mask(k) = .true.
      enddo

      return
   end subroutine update_mask

end module allow_mod

program euler
   ! 27^5 + 84^5 + 110^5 + 133^5 = 144^5
   use, intrinsic :: iso_fortran_env, only: ik => int64
   use allow_mod, only: allow_t, fill_allow, print_allow

   implicit none

   integer, parameter :: nmax = 6208
   integer, parameter :: findmax = 1
   integer :: i,j, jmin, k, kmin, l, lmin, m, mmin, nfound
   integer(ik) :: i5, sj, sk, sl, sm, diff
   integer(ik) :: n5(nmax)
   character(*), parameter :: cfmt = '(*(g0,1x))'

   type(allow_t), allocatable :: allow(:)
   
   call set_allow( [275, 31, 11], allow )    ! some good values are 11, 31, 100, 275.

   nfound = 0
   outer: do i = 1, nmax
      i5 = int(i,ik)**5
      n5(i) = i5
      call set_i5modN( i5, allow )
      jmin = index_min( i5/4 )
      do j = jmin, i
         sj = n5(j)
         kmin = index_min( (i5-sj)/3 )
         do k = kmin, j
            sk = sj + n5(k)
            diff = i5 - sk
            if ( diff < 0 ) exit
            if ( mod_test_k( sk, allow ) ) cycle
            lmin = index_min( diff/2 )
            do l = lmin, k
               sl = sk + n5(l)
               diff = i5 - sl    ! for a solution m**5 == diff is required.
               if ( diff < 0 ) exit
               if ( (sl + n5(l)) < i5 ) cycle   ! no solution in m=1:l.
               if ( mod_test_l( sl, allow ) ) cycle

               mmin = index_min( diff )
               do m = mmin, l
                  sm = sl + n5(m)
                  diff = i5 - sm
                  if ( diff > 0 ) then
                     cycle
                  elseif ( diff < 0 ) then
                     exit
                  else   ! sm == i5
                     nfound = nfound + 1
                     print cfmt, "i^5 = ", i5
                     print cfmt, j, k, l, m, i
                     if ( nfound >= findmax ) then
                        exit outer
                     else
                        exit  ! exit m loop and continue searching.
                     endif
                  endif
                  
               enddo
            enddo
         enddo
      enddo
   enddo outer

contains

   pure function index_min( v ) result( r )
      ! return a value r such that r**5 <= v.
      ! it need not be the largest value, but the closer to v the better.
      implicit none
      integer                 :: r
      integer(ik), intent(in) :: v

      r = int( real(v)**0.2 )
      if ( int(r,ik)**5 > v ) r = r - 1   ! allow for possible floating point error.
      r = max( 1, r )

      return
   end function index_min

   subroutine set_allow( modvals, allow )
      use, intrinsic :: iso_fortran_env, only: nlist => output_unit
      implicit none
      integer, intent(in)                     :: modvals(:)
      type(allow_t), intent(out), allocatable :: allow(:)

      integer :: i

      allocate( allow(size(modvals)) )
      
      do i = 1, size(modvals)
         call fill_allow( modvals(i), allow(i) )
         if ( .false. ) call print_allow( nlist, allow(i) )
      enddo

      return
   end subroutine set_allow

   subroutine set_i5modN( i5, allow )
      ! set all of the mapped i5modN values.
      implicit none
      integer(ik), intent(in)      :: i5
      type(allow_t), intent(inout) :: allow(:)

      integer :: i, N

      do i = 1, size(allow)
         N = allow(i)%N
         allow(i)%i5modNx = allow(i)%i%map( mod(i5,N) )
      enddo

      return
   end subroutine set_i5modN

   pure logical function mod_test_k( sk, allow )
      ! perform modulo tests on the k loop index.
      implicit none
      integer(ik), intent(in)   :: sk
      type(allow_t), intent(in) :: allow(:)

      integer :: i, i5modNx, N

      ! loop in order.

      mod_test_k = .false.
      do i = 1, size(allow)
         N          = allow(i)%N
         i5modNx    = allow(i)%i5modNx
         mod_test_k = allow(i)%k(i5modNx)%map(int(mod(sk,N))) .eq. 0
         if ( mod_test_k ) exit   ! exit ASAP.
      enddo

      return
   end function mod_test_k
      
   pure logical function mod_test_l( sl, allow )
      ! perform modulo tests on the l loop index.
      implicit none
      integer(ik), intent(in)   :: sl
      type(allow_t), intent(in) :: allow(:)

      integer :: i, i5modNx, N

      ! loop in order.

      mod_test_l = .false.
      do i = 1, size(allow)
         N          = allow(i)%N
         i5modNx    = allow(i)%i5modNx
         mod_test_l = allow(i)%l(i5modNx)%map(int(mod(sl,N))) .eq. 0
         if ( mod_test_l ) exit   ! exit ASAP.
      enddo

      return
   end function mod_test_l

end program euler

This is much longer than the original code with no loop tests of any kind, but it now runs something like 1000x faster than that simple version.

The line

call set_allow( [275, 31, 11], allow )

now determines which modulo tests are done. I’m still trying to understand which combinations are the most useful. The above timings were done with those values. If I add a fourth test with a value of 100, the code slows down a little, showing that the index value test overhead starts to cost more than just executing the redundant loops. Also moving back to one or two tests, slows down the code a little. But these are just empirical measurements, I don’t fully understand yet how those tests interact with each other.

Just to give credit where credit is due, it was @themos who suggested the hyper-triangular loop index restrictions, @JohnCampbell who suggested the minimum loop value idea, and It was @mecej4 who suggested the use of modulo relations. This version combines all of those ideas to get these remarkable timings.

I should also mention, as a fortran language feature, that this seems like an ideal situation to use parameterized derived types. This version of the code uses allocatable arrays. I might experiment a little with PDTs, but I don’t know how many compilers could actually compile such a version.

EDIT2: I made two changes to the last version of the code in this post. First, I changed the index_min() function to account for floating point error. This is explained in detail in a subsequent post. The other change is that I posted the original code with a typo. A 275 value was typed in as 175, and I didn’t notice it until later even after I discussed that value in the text. This does have a small affect on some of the timings. For example, with

call set_allow( [275, 31], allow )

I get these Apple M2 timings for the first soluton:

i^5 =  61917364224
133 110 84 27 144
real    0m0.008s
user    0m0.005s
sys     0m0.002s

and these timings for three solutions:

i^5 =  61917364224
133 110 84 27 144
i^5 =  1981355655168
266 220 168 54 288
i^5 =  15045919506432
399 330 252 81 432

real    0m0.061s
user    0m0.059s
sys     0m0.002s

These are slightly better than the previous times with that version of the code where I did modulo tests with three values.

Thanks for your continuing enhancements to the Euler Conjecture Fortran code.
I find that just
call set_allow( [30], allow )
suffices to give a considerable speedup.

Today, I looked up RosettaCode after a gap of many months and, unsurprisingly, now there is an article on the Euler Conjecture, with codes in many languages. Some of the C++ versions use calculations modulo 30, and one uses a hash table, both to speed up the calculations. There are two Fortran codes there, but they lag behind the C++ codes in the algorithms that they use.
The following Mathematica code is also given at RosettaCode, and it is hard for other “languages” to compete with its straightforwardness and simplicity:

           a^5 + b^5 + c^5 + d^5 == n^5 && a > 0 && b > 0 && c > 0 && 
            d > 0, {a, b, c, d, n}, Integers][[1, All,-1]]]

Out[3]= {27, 84, 110, 133, 144} ```

In the problem description paragraph, it is stated that " In 1966, Leon J. Lander and Thomas R. Parkin used a brute-force search on a [CDC 6600](https://en.wikipedia.org/wiki/CDC_6600) computer restricting numbers to those less than 250". I do not know if there is a still-available publication in which that restriction in the calculation  is stated.
Does someone know if the idea of "hashing" -- to retrieve stored information quickly -- was known in Euler's time? Was the idea known to Lander and Parkin

Here’s a fairly fast algorithm: Look for a number that appears in both { j⁵+k⁵+l⁵ } and { i⁵-n⁵ } where 0 < j <= k <= l and 0 < n < i

If we only consider l and i under 250, then the sets would contain less than 3 million integers each.

Strength reduction can be used to replace all the multiplications with additions: https://en.wikipedia.org/wiki/Strength_reduction

1 Like

That is an interesting article. There is a wide range of performance displayed, as would be expected for compiled and interpreted languages. Of course, the underlying hardware plays a role in the timings too, but I didn’t see any timings there that were in the single-digit millisecond range for the location of one solution like we have here with our fortran-based algorithm. Of course, that fast fortran code is much more complicated than the original code with just nested do loops.

The way that code is written, most everything within the i loop is independent. It should be possible to make a parallel coarray version based on that decomposition.

One other comment I wanted to make is about the index_min() function. I did some further testing, and that function is not correct as written for all values of the argument v. One simple fix is to change it to

r = int( real(v)**0.2 )
if ( int(r,ik)**5 > v ) r = r - 1
r = max( 1, r )

This seems strange to me, since that first int() function should always round down when necessary. I finally figured out that it is the exponent 0.2 itself that causes the problem. Consider this program

program printerr
   use, intrinsic :: iso_fortran_env, only: sp=>real32, dp=>real64
   implicit none
   write(*,'(a,b35)') '0.2_sp=', 0.2_sp
   write(*,'(a,b64)') '0.2_dp=', 0.2_dp
end program printerr

0.2_sp=     111110010011001100110011001101
0.2_dp=  11111111001001100110011001100110011001100110011001100110011010

For that 0.2 value the binary digits ‘0011’ are repeated, so the floating point values will be rounded. In the output above, I shifted the sp value over so that the leading binary digits line up with the dp value. Remember also that both the sp and dp values have a hidden high-order ‘1’ bit. Whether the value is too large or too small depends on where in that ‘0011’ sequence the roundoff occurs. As you can see, the sp value is indeed rounded up in the last bit. So what happens is that values of v that are just a little smaller than an i**5 value are raised to a power that is just a little too big. That results in some incorrect values being computed, and all of them are too large by 1.

You might think that a simple solution would just be to use the dp value, but if you look at the dp bit sequence, you can see that it also is rounded up to be larger than the exact value. Consequently, the 0.2_dp value also produces incorrect results, but of course for much larger values of v.

Another possible solution might be to manually change that last bit so that it is purposely too small. That eliminates the problem with large values, but then it will err for v values that are just a little bigger than an i**5 value and the computed values are too small. That is alright in this case, since the next loop value will be what should have been the correct one. So this is not an aesthetically pleasing fix either.

I would say that the above correction works in a practical sense, but it looks like a hack, not exactly a legitimate algorithm. I experimented a little with integer-only algorithms to compute the integer fifth root, and none of them perform as well as the floating point conversion algorithm. They aren’t slow by much, typically less than a factor of 2, but it is a little unsettling nonetheless.

EDIT: I changed the last version of my earlier posted code to include this correction. It does not change the timings. I did not go back and change the earlier posts that also had this error.

1 Like

Please look at the compact C contribution (by Alexander Maximov) at RosettaCode, and consider whether the comments regarding “x^5==x modulo 30 for any x” can also be beneficial in your codes.

I was thinking this cannot be right, but it is!

x^5-x = (x-1)*x*(x+1)*(x^2+1)

So it’s a product of three consecutive integers, so it must be divisible by 2 and 3. One must consider all remainders for divisibility by 5. So it’s divisible by 2*3*5=30.

1 Like

I did explore a little the mod 30 relation. I summarized it in this earlier post. Euler Conjecture and CDC 6600 - #60 by RonShepard. I then moved on to the more general permutation array cases, which allow for larger N, and finally settled on using N=145. That allows the inner m loop to move in increments of 145. However, my latest versions of the code use instead the index_min() function for all of the loops, including the m loop, so the inner m loop executes in a single pass anyway (or two passes, if the index_min() function returns a value that is a little too small). I did not explore the possibility of using the mod 30 relation (or the more general permutation array relation) on the other loops, just the inner loop. Maybe that would be useful?

BTW, the C/C++ codes that use the mod 30 relation have timings that range from 83ms up to 235 ms to find the first solution. Hardware matters to the timings, of course, but the index_min() approach still seems to be the most productive.

Regarding mod tests in general, there are two different ways that these are useful. In the mod 30 case, or in the more general permutation array cases with larger N, the advantage is that they allow the inner loop to be incremented by values of N rather than by 1. But that is not how my last code is using the mod tests. Here you look for cases where mod(i**5,N) is sparse, not dense. For the N=275 case, there are only 15 possible nonzero values. That is 15 out of 275. Then you can use that information to limit the outer l and k loop values based on what is the target mod(i**5,N) value. I managed to generalize those tests, and that is the code that is in the allow_mod module. This allows the outer loops to use multiple N values in the mod tests, and if any of those tests fail, then the loop value can be skipped. That code is not sophisticated, I didn’t use any exotic ring algebra or anything like that. The suggested values in the code (11,31,100,275) are just a few of the values that result in small sparsity ratios.

1 Like

I had previously noted that mod(i**5,N)=i for N={2,3,5,6,10,15,30}, but I did not know why. This relation explains why. The first three terms mean it is divisible by 2 and 3, and Fermat’s little theorem says that it is divisible by 5, and that list of numbers are all the multiples involving just those three prime numbers. It doesn’t hold for 31 and 32, and it cannot hold for any larger values of N because mod(2**5,N)=32 in those cases.

2 Likes

There was an off-by-one error in the last version of the code that I posted. In the process of fixing that, I put in some counters to verify that my average statistics were valid, and I concluded that they aren’t. Basically, my assumptions about even distributions of modulo values for the various sk and sl partial sums were incorrect. So I decided to leave in the counters and remove the approximate statistics evaluations. The timings are unaffected, so that seems like a good tradeoff. I also did some other code refactoring, such as moving some of the internal functions and subroutines from the main program into the module. Here are the Apple M2 timings with gfortran for findmax=1

1: 133^5 + 110^5 + 84^5 + 27^5 = 144^5 = 61917364224
modulo test: kkntt= 30840 kkntf= 15357 ratio= 4.980E-1
modulo test: lkntt= 99991 lkntf= 11481 ratio= 1.148E-1

real    0m0.008s
user    0m0.005s
sys     0m0.002s

and here are the findmax=3 timings

1: 133^5 + 110^5 + 84^5 + 27^5 = 144^5 = 61917364224
2: 266^5 + 220^5 + 168^5 + 54^5 = 288^5 = 1981355655168
3: 399^5 + 330^5 + 252^5 + 81^5 = 432^5 = 15045919506432
modulo test: kkntt= 782569 kkntf= 401860 ratio= 5.135E-1
modulo test: lkntt= 8071752 lkntf= 952863 ratio= 1.180E-1

real    0m0.062s
user    0m0.060s
sys     0m0.002s

That kkntf ratio of 0.51 basically says that 49% of the k loop values are skipped due to the modulo tests, and the lkntf ratio of 0.11 says that about 89% of the l loop values are skipped. The next challenge is to find good modulo values that reduce the kkntf and lkntf values and their associated ratios. Here is the version of the code that produced those timings.

module allow_mod
   use, intrinsic :: iso_fortran_env, only: ik => int64
   implicit none

   private
   public :: ik, allow_t, fill_allow, print_allow, set_allow, set_i5modN, mod_test_k, mod_test_l

   interface print_allow  ! allow both scalar and array arguments.
      module procedure :: print_allow, print_allow_1
   end interface print_allow

   integer, parameter :: lk = selected_logical_kind(1)  ! smallest logical value.

   type level_t
      integer :: nval                      ! number of allowed values.
      integer, allocatable :: val(:)       ! (1:nval) allowed values.
      integer, allocatable :: map(:)       ! (0:N-1)  map from mod(m,N) to contiguous index, ix.
   end type level_t

   type allow_t
      integer                    :: N        ! modulus value.
      integer                    :: i5modNx  ! mapped mod(i**5,N) values
      type(level_t)              :: i        ! target i info.
      type(level_t), allocatable :: l(:)     ! (1:nval) loop level l info.
      type(level_t), allocatable :: k(:)     ! (1:nval) loop level k info.
   end type allow_t

   character(*), parameter :: cfmt = '(*(g0,1x))'

contains

   subroutine fill_allow( N, allow )

      ! allocate and fill in the allow derived type based on the modulus N.

      ! the goal is to limit the k and l loop indices when possible. this is
      ! achieved by using modulo arithmetic to limit the possibile
      ! matches of j**5 + k**5 + l**5 + m**5 with the target i**5.

      implicit none
      integer, intent(in)        :: N      ! modulus value with range up to about 1000.
      type(allow_t), intent(out) :: allow

      integer :: i, k, i5x, lx, nval
      integer :: i5mod(1:N), val(1:N)          ! temp work arrays.
      logical(lk) :: mask(0:N-1), kmask(0:N-1)

      allow%N       = N
      allow%i5modNx = -1  ! this is filled in later.

      ! compute mask(:) for the possible mod(i**5,N) values.
      
      mask(:) = .false.
      do i = 1, N
         k        = mod(int(i,ik)**5,N)
         i5mod(i) = k
         mask(k)  = .true.
      enddo

      allocate( allow%i%map(0:N-1) )
      call set_val_map( mask, allow%i%map, val, nval )
      allow%i%nval = nval
      allow%i%val  = val(1:nval)

      ! compute the kmask(:) array.
      ! the .true. entries are the only sk values that can be generated.

      kmask(:) = .false.
      do i5x = 1, allow%i%nval
         call update_mask( N, allow%i%val(i5x), allow%i%val, kmask )
      enddo

      ! allocate and fill in the level l and level k info.

      allocate( allow%l(1:nval), allow%k(1:nval) )

      do i5x = 1, allow%i%nval
         ! fill in the level l info for this i5x.
         mask(:) = .false.
         call update_mask( N, allow%i%val(i5x), allow%i%val, mask )
         allocate( allow%l(i5x)%map(0:N-1) )
         call set_val_map( mask, allow%l(i5x)%map, val, nval )
         allow%l(i5x)%nval = nval
         allow%l(i5x)%val  = val(1:nval)

         ! fill in the level k info for this i5x.
         mask(:) = .false.
         do lx = 1, allow%l(i5x)%nval
            call update_mask( N, allow%l(i5x)%val(lx), allow%i%val(1:allow%i%nval), mask )
         enddo
         mask = mask .and. kmask  ! mask out any unreachable values.
         allocate( allow%k(i5x)%map(0:N-1) )
         call set_val_map( mask, allow%k(i5x)%map, val, nval )
         allow%k(i5x)%nval = nval
         allow%k(i5x)%val  = val(1:nval)
      enddo

   end subroutine fill_allow

   subroutine print_allow( nlist, allow )
      ! print the contents of allow with annotations.
      implicit none
      integer, intent(in)       :: nlist  ! output unit
      type(allow_t), intent(in) :: allow

      character(*), parameter :: fmtg = '(*(g0.3,1x))'

      integer :: i5x

      write(nlist,fmtg)
      write(nlist,fmtg) 'Allowed index values for modulus N=', allow%N

      write(nlist,fmtg)
      write(nlist,fmtg) 'Loop level m, nval=', allow%i%nval, &
           & 'nonzero ratio=', real(allow%i%nval)/allow%N
      write(nlist,fmtg) 'Allowed smx values, val(:)=', allow%i%val

      write(nlist,fmtg)
      write(nlist,fmtg) 'Loop level l info:'
      do i5x = 1, allow%i%nval
         write(nlist,fmtg) 'For i5x=', allow%i%val(i5x), 'nval=', allow%l(i5x)%nval, &
              & 'nonzero ratio=', real(allow%l(i5x)%nval)/allow%N
         write(nlist,fmtg) 'Allowed slx values, val(:)=', allow%l(i5x)%val
      enddo

      write(nlist,fmtg)
      write(nlist,fmtg) 'Loop level k info:'
      do i5x = 1, allow%i%nval
         write(nlist,fmtg) 'For i5x=', allow%i%val(i5x), 'nval=', allow%k(i5x)%nval, &
              & 'nonzero ratio=', real(allow%k(i5x)%nval)/allow%N
         write(nlist,fmtg) 'Allowed skx values, val(:)=', allow%k(i5x)%val
      enddo

      return
   end subroutine print_allow

   subroutine print_allow_1( nlist, allow )
      ! print the contents of the 1-d array allow(:) and overall summary info.
      implicit none
      integer, intent(in)       :: nlist
      type(allow_t), intent(in) :: allow(:)

      integer :: i

      do i = 1, size(allow)
         call print_allow( nlist, allow(i) )
      enddo

      return
   end subroutine print_allow_1

   subroutine set_val_map( mask, map, val, nval )
      ! search mask(0:) for .true. entries, set map(0:) and save the indices in val(:).
      ! the arrays map(0:) and val(:) are inverse mappings.
      implicit none
      logical(lk), intent(in)  :: mask(0:) ! (0:N-1).
      integer, intent(out)     :: map(0:)  ! (0:N-1)
      integer, intent(out)     :: val(:)   ! allow(1:nval) are the allowed values.
      integer, intent(out)     :: nval     ! number of allowed values.

      integer :: i

      map(:) = 0
      nval   = 0
      do i = 0, ubound(mask,dim=1)
         if ( mask(i) ) then
            nval      = nval + 1
            val(nval) = i
            map(i)    = nval
         endif
      enddo

      return
   end subroutine set_val_map

   subroutine update_mask( N, m, val, mask )
      ! update mask(:) using m and val(:).
      ! this is the modulo N ring addition of m with the additive inverse of the elements of val(:).
      ! mask(k)==.true. if k is in the output set.
      implicit none
      integer, intent(in)        :: N        ! the current modulus.
      integer, intent(in)        :: m        ! the target value.
      integer, intent(in)        :: val(:)   ! set of input values.
      logical(lk), intent(inout) :: mask(0:) ! updated ring value indexes.

      integer :: i

      do i = 1, size(val)
         mask(modulo( m - val(i), N )) = .true.
      enddo

      return
   end subroutine update_mask

   subroutine set_allow( modvals, allow )
      ! allocate and fill the allow derived type array.
      implicit none
      integer, intent(in)                     :: modvals(:)
      type(allow_t), intent(out), allocatable :: allow(:)

      integer :: i

      allocate( allow(size(modvals)) )

      do i = 1, size(modvals)
         call fill_allow( modvals(i), allow(i) )
      enddo

      return
   end subroutine set_allow

   subroutine set_i5modN( i5, allow )
      ! set all of the mapped i5modN values.
      implicit none
      integer(ik), intent(in)      :: i5
      type(allow_t), intent(inout) :: allow(:)

      integer :: i, N

      do i = 1, size(allow)
         N = allow(i)%N
         allow(i)%i5modNx = allow(i)%i%map( mod(i5,int(N,ik)) )
      enddo

      return
   end subroutine set_i5modN

   pure logical function mod_test_k( sk, allow )
      ! perform modulo tests on the k loop index.
      implicit none
      integer(ik), intent(in)   :: sk
      type(allow_t), intent(in) :: allow(:)

      integer :: i, i5modNx, N

      ! loop in order.

      mod_test_k = .false.
      do i = 1, size(allow)
         N          = allow(i)%N
         i5modNx    = allow(i)%i5modNx
         mod_test_k = allow(i)%k(i5modNx)%map(int(mod(sk,int(N,ik)))) .eq. 0
         if ( mod_test_k ) exit   ! exit ASAP.
      enddo

      return
   end function mod_test_k

   pure logical function mod_test_l( sl, allow )
      ! perform modulo tests on the l loop index.
      implicit none
      integer(ik), intent(in)   :: sl
      type(allow_t), intent(in) :: allow(:)

      integer :: i, i5modNx, N

      ! loop in order.

      mod_test_l = .false.
      do i = 1, size(allow)
         N          = allow(i)%N
         i5modNx    = allow(i)%i5modNx
         mod_test_l = allow(i)%l(i5modNx)%map(int(mod(sl,int(N,ik)))) .eq. 0
         if ( mod_test_l ) exit   ! exit ASAP.
      enddo

      return
   end function mod_test_l

end module allow_mod

program euler
   ! 133^5 + 110^5 + 84^5 + 27^5 = 144^5 = 61917364224
   use, intrinsic :: iso_fortran_env, only: nlist => output_unit
   use allow_mod, only: ik, allow_t, fill_allow, print_allow, set_allow, set_i5modN, mod_test_k, mod_test_l

   implicit none

   integer, parameter :: imax = 6208          ! integer range to search i=1:imax.
   integer, parameter :: findmax = 3          ! max number of solutions to find.
   logical, parameter :: printallow = .false. ! print the allow derived type info.

   integer :: i,j, jmin, k, kmin, l, lmin, m, mmin, nfound
   integer(ik) :: i5, sj, sk, sl, sm, diff
   integer(ik) :: kkntf, kkntt, lkntf, lkntt  ! modulo test stats.
   integer(ik) :: n5(imax)
   character(*), parameter :: cfmt = '(5(g0,1x),es0.3)'
   character(*), parameter :: cfmtf = '(i0,": ",3(i0,"^5 + "),i0,"^5 = ",i0,"^5 = ",i0)'

   type(allow_t), allocatable :: allow(:)

   call set_allow( [775,275], allow )    ! some good values are 11, 31, 100, 275, 775.
   if ( printallow) call print_allow( nlist, allow )

   lkntf = 0; lkntt=0; kkntf=0; kkntt=0  ! loop count stats.

   nfound = 0
   outer: do i = 1, imax
      i5 = int(i,ik)**5
      n5(i) = i5
      call set_i5modN( i5, allow )
      jmin = index_min( i5/4 )
      do j = jmin, i
         sj = n5(j)
         kmin = index_min( (i5-sj)/3 )
         do k = kmin, j
            sk = sj + n5(k)
            diff = i5 - sk
            if ( diff <= 0 ) exit
            kkntt = kkntt + 1    ! before mod tests.
            if ( mod_test_k( sk, allow ) ) cycle
            kkntf = kkntf + 1    ! after mod tests.
            lmin = index_min( diff/2 )
            do l = lmin, k
               sl = sk + n5(l)
               diff = i5 - sl    ! for a solution m**5 == diff is required.
               if ( diff <= 0 ) exit
               if ( n5(l) < diff ) cycle   ! no solution in m=1:l.
               lkntt = lkntt + 1    ! before mod tests
               if ( mod_test_l( sl, allow ) ) cycle
               lkntf = lkntf + 1    ! after mod tests

               mmin = index_min( diff )
               mloop: do m = mmin, l
                  sm = sl + n5(m)
                  diff = i5 - sm
                  select case (diff)
                  case (:-1)   ! i5 < sm
                     exit mloop
                  case (0)     ! i5 == sm
                     nfound = nfound + 1
                     write(nlist,cfmtf) nfound, j, k, l, m, i, i5
                     if ( nfound >= findmax ) then
                        exit outer
                     else
                        exit mloop  ! exit m loop and continue searching.
                     endif
                  case default !(1:) sm < i5
                     cycle mloop
                  end select
               enddo mloop

            enddo
         enddo
      enddo
   enddo outer
   write(nlist,cfmt) 'modulo test: kkntt=', kkntt, 'kkntf=', kkntf, 'ratio=', real(kkntf)/kkntt
   write(nlist,cfmt) 'modulo test: lkntt=', lkntt, 'lkntf=', lkntf, 'ratio=', real(lkntf)/lkntt

contains

   pure function index_min( v ) result( r )
      ! return a value r such that r**5 <= v.
      ! it need not be the largest value, but the closer to v the better.
      implicit none
      integer                 :: r
      integer(ik), intent(in) :: v

      r = int( real(v)**0.2 )
      if ( int(r,ik)**5 > v ) r = r - 1   ! possible floating point error.
      r = max( 1, r )                     ! enforce lower bound of 1.

      return
   end function index_min

end program euler

It is also possible now with this version of the code to search the full space i=1:6208 for solutions. That is the largest value for which i**5 does not overflow with INT64 arithmetic. Set findmax to a value larger than 43 to search the full space. The earlier versions of the code would have taken many hours to do that. Here is the timing output for this version.

1: 133^5 + 110^5 + 84^5 + 27^5 = 144^5 = 61917364224
2: 266^5 + 220^5 + 168^5 + 54^5 = 288^5 = 1981355655168
3: 399^5 + 330^5 + 252^5 + 81^5 = 432^5 = 15045919506432
4: 532^5 + 440^5 + 336^5 + 108^5 = 576^5 = 63403380965376
5: 665^5 + 550^5 + 420^5 + 135^5 = 720^5 = 193491763200000
6: 798^5 + 660^5 + 504^5 + 162^5 = 864^5 = 481469424205824
7: 931^5 + 770^5 + 588^5 + 189^5 = 1008^5 = 1040645140512768
8: 1064^5 + 880^5 + 672^5 + 216^5 = 1152^5 = 2028908190892032
9: 1197^5 + 990^5 + 756^5 + 243^5 = 1296^5 = 3656158440062976
10: 1330^5 + 1100^5 + 840^5 + 270^5 = 1440^5 = 6191736422400000
11: 1463^5 + 1210^5 + 924^5 + 297^5 = 1584^5 = 9971853425639424
12: 1596^5 + 1320^5 + 1008^5 + 324^5 = 1728^5 = 15407021574586368
13: 1729^5 + 1430^5 + 1092^5 + 351^5 = 1872^5 = 22989483914821632
14: 1862^5 + 1540^5 + 1176^5 + 378^5 = 2016^5 = 33300644496408576
15: 1995^5 + 1650^5 + 1260^5 + 405^5 = 2160^5 = 47018498457600000
16: 2128^5 + 1760^5 + 1344^5 + 432^5 = 2304^5 = 64925062108545024
17: 2261^5 + 1870^5 + 1428^5 + 459^5 = 2448^5 = 87913803014995968
18: 2394^5 + 1980^5 + 1512^5 + 486^5 = 2592^5 = 116997070082015232
19: 2527^5 + 2090^5 + 1596^5 + 513^5 = 2736^5 = 153313523637682176
20: 2660^5 + 2200^5 + 1680^5 + 540^5 = 2880^5 = 198135565516800000
21: 2793^5 + 2310^5 + 1764^5 + 567^5 = 3024^5 = 252876769144602624
22: 2926^5 + 2420^5 + 1848^5 + 594^5 = 3168^5 = 319099309620461568
23: 3059^5 + 2530^5 + 1932^5 + 621^5 = 3312^5 = 398521393801592832
24: 3192^5 + 2640^5 + 2016^5 + 648^5 = 3456^5 = 493024690386763776
25: 3325^5 + 2750^5 + 2100^5 + 675^5 = 3600^5 = 604661760000000000
26: 3458^5 + 2860^5 + 2184^5 + 702^5 = 3744^5 = 735663485274292224
27: 3591^5 + 2970^5 + 2268^5 + 729^5 = 3888^5 = 888446500935303168
28: 3724^5 + 3080^5 + 2352^5 + 756^5 = 4032^5 = 1065620623885074432
29: 3857^5 + 3190^5 + 2436^5 + 783^5 = 4176^5 = 1269996283285733376
30: 3990^5 + 3300^5 + 2520^5 + 810^5 = 4320^5 = 1504591950643200000
31: 4123^5 + 3410^5 + 2604^5 + 837^5 = 4464^5 = 1772641569890893824
32: 4256^5 + 3520^5 + 2688^5 + 864^5 = 4608^5 = 2077601987473440768
33: 4389^5 + 3630^5 + 2772^5 + 891^5 = 4752^5 = 2423160382430380032
34: 4522^5 + 3740^5 + 2856^5 + 918^5 = 4896^5 = 2813241696479870976
35: 4655^5 + 3850^5 + 2940^5 + 945^5 = 5040^5 = 3252016064102400000
36: 4788^5 + 3960^5 + 3024^5 + 972^5 = 5184^5 = 3743906242624487424
37: 4921^5 + 4070^5 + 3108^5 + 999^5 = 5328^5 = 4293595042302394368
38: 5054^5 + 4180^5 + 3192^5 + 1026^5 = 5472^5 = 4906032756405829632
39: 5187^5 + 4290^5 + 3276^5 + 1053^5 = 5616^5 = 5586444591301656576
40: 5320^5 + 4400^5 + 3360^5 + 1080^5 = 5760^5 = 6340338096537600000
41: 5453^5 + 4510^5 + 3444^5 + 1107^5 = 5904^5 = 7173510594925953024
42: 5586^5 + 4620^5 + 3528^5 + 1134^5 = 6048^5 = 8092056612627283968
43: 5719^5 + 4730^5 + 3612^5 + 1161^5 = 6192^5 = 9102375309234143232
modulo test: kkntt= 2251762772 kkntf= 1168337215 ratio= 5.189E-1
modulo test: lkntt= 341425893798 lkntf= 40830638283 ratio= 1.196E-1

real    22m26.829s
user    22m21.480s
sys     0m5.278s

EDIT: I have written a draft of a short paper that summarizes the contents of this discussion thread and contains three version of the fortran code, a straightforward version and two optimized versions. My intention is to post the paper and the associated fortran codes in the Tutorial section here in the discourse.

There is also the possibility of posting this to the RosettaCode site at some time in the future. The fortran codes that are already there are, let’s say, problematic.

My first goal is to ensure that everyone is acknowledged appropriately for their contributions to this thread. I think the best way would be to just include everyone in the author list of the paper. My intention is for this to be as inclusive as possible. Here is my current list:

@mecej4, @themos, @ivanpribec, @rwmsu,
@MarDie, @JohnCampbell, @wspector, @certik,
@ivanpribec,

If anyone else wants their name added, just notify me here in this thread or with a DMS. If someone wants their name removed, that will also be done, of course. There is also the possibility of adding names to an acknowledgement within the paper rather than as an author.

After I get the correct list of names, I will post the paper, which is in PDF format written originally in latex. I don’t think there is an easy way to edit a shared document here in this discourse group, so I am hoping that editing of the paper itself can be done just with DMS. We’ll see how that goes, and if other options, such as Overleaf, are better, then we can do that instead.

2 Likes