Euler Conjecture and CDC 6600

The 7600s were typically “front ended” by a lesser CDC machine. So it is possible you submitted jobs to the 7600 from the 6600. Fortran-wise, I’m not sure if a variant of RUN was ever used on the 7600. FTN, and later FTN5, certainly supported the 7600. The instruction set was almost identical to the 6000 series machines. Though there were some differences in how I/O was done behind the scenes, as the 7600 had a very different I/O system.

At the U of Arizona, they had a CDC 6400, and later a Cyber-175, linked to a DEC-10 (dual KI-10, later dual KL-10) system. The -175 basically had a very fast 7600-like CPU with the I/O system (peripheral processors, and peripherals) of the lower CDC machines. But I really liked the interactive environment that TOPS-10 provided. So my dream system was a fast CDC machine running TOPS-10. Finally got it years later when Cray implemented Unicos on the CRAY-2 and X-MP systems.

Just the basic code runs on my machine in ~8.5 seconds (gfortran -O3). Replacing the innermost loop with a binary search reduces it to ~2.3 seconds. So a nice speedup.

I initially tried replacing the inner loop with the findloc intrinsic. Runs slower - ~10.5 seconds.

Interestingly, my binary search routine fails with lfortran - depending how it is coded. Using an if/elseif/else/endif for the three cases works. But using select case doesn’t…

1 Like

The flang compiler worked alright with the code using int64, but I had some problems with int128, so I had to make those minor changes as workarounds there too.

If you put the binary search in a subroutine (as it probably should be aesthetically) then it slows down. That is why I put it right inside the innermost loop, to avoid the subroutine call overhead. I think the code is to the point where there isn’t much left to squeeze out. I think it would take an actual algorithm change to improve it significantly.

If you can, post your code at GitHub · Where software is built and we’ll fix LFortran to compile it.

It is a bit disappointing to recall that our “new” 1974 CDC 6600 was from 1966.
The thought of available memory at that time to do finite element analysis is now difficult to comprehend.
While the then Pr1me 300 with 2-byte integers was very limiting, it did provide the ability to produce meaningful analyses. I don’t recall how or when integer*4 was available, but I think they were software emulated.
Recalling that then the 60-bit integers on the 6600 were huge in comparison, but still limiting for the Euler Conjecture.

It compiles fine. Just produces wrong results. Submitted as PR 8897.

For those interested, here is one version of the code I’ve been playing with. (Yes, it could be cleaned up a bit.):

program euler3
      use iso_fortran_env
      implicit none

      INTEGER(int64) I,J,K,L,M,P5(145)
      integer(int64) :: temp
      integer :: loc

      DO, I=1,145
        P5(I)=I*I*I*I*I
      ! print ('(i5, i20, z20)'), i, p5(i), p5(i)
      end do

i_loop: DO, I = 1, 145
        if (mod (i, 10) == 0) print *, 'I iteration:', i
        DO, J = 1, I
          DO, K = 1, I
            DO, L = 1, I
              temp = -(P5(J) + P5(K) + P5(L) - p5(i))
#if 1
! Try various search functions
              ! loc = findloc (p5, temp, dim=1)
              loc = binsearch (p5, temp)
              if (loc /= 0) then
                m = loc
                exit i_loop
              end if
#else
! Inline search loop
              DO, M = 1, I
                if (temp == p5(m)) exit i_loop
              end do
#endif
            end do
          end do
        end do
      end do i_loop
      if (i <= 145) then
        PRINT 777
        PRINT 999, J,K,L,M,I
      end if

  777 FORMAT('EULER CONJECTURE DISPROVED.')
  999 FORMAT(5I10)

contains

  pure integer function binsearch (a, val) result (res)
    integer(int64), intent(in) :: a(0:), val
!   Based on https://grokipedia.com/page/Binary_search_algorithm

    integer :: low, high, mid

    low = 0
    high = size (a) - 1
    do while (low <= high)
      mid = (low + high)/2
! Compare select case vs ifs
#if 1
! With lfortran this fails, but works if the 0 case is specified
! and the (:-1) case is left for default
      select case (a(mid) - val)
      case (:-1)
        low = mid + 1
      case (1:)
        high = mid - 1
      case default
        res = mid + 1
        return
      end select
#else
      if (val == a(mid)) then
        res = mid + 1
        return
      else if (val < a(mid)) then
        high = mid - 1
      else
        low = mid + 1
      end if
#endif
    end do
    res = 0

  end function

END

If I change the SELECT CASE to use 64-bit integers, the lfortran generated code works:

      select case (a(mid) - val)
      case (:-1_int64)
        low = mid + 1
      case (1_int64:)
        high = mid - 1
      case default
        res = mid + 1
        return
      end select
1 Like

I have some good news and some bad news. First the good news. I removed the exclude_* functions in my code and replaced them with a simple array lookup to determine when the inner loops can be skipped. This helps the times a little bit.

i^5 =  61917364224
133 110 84 27 144

real    0m0.025s
user    0m0.022s
sys     0m0.002s

These are the Apple M2 times, and they are about 10% faster than the previous version with the function calls which had a select case block and some logical expressions. I folded all of those expressions into a table, and now I just do a table lookup to do the branches.

Those branch exits are based on @mecej4’s idea of using modulo 11 comparisons. For reasons that I don’t fully understand, the residuals mod(i**5,p) are sparse for some prime numbers p but not for others. For p=11, you can start with mod(i**10,11). Fermat’s little theorem says that is equal to 0 if i is a multiple of 11, and it is equal to 1 otherwise. That then gives mod(i**5,11)**2=1 and mod(i**5,11)=±1. So out of the 11 possible values for the residual, only the values {-1,0,1} occur. The -1 value is congruent to 10, so the values that occur in the fortran code are {0,1,10}.

I looked at some other prime numbers p, like 13, 17, 23, and so on. None of those are sparse, so they can’t be used to screen out the inner loops. But p=31 is sparse, with residuals {0,1,5,6,25,26,30}. I think that is because mod(i**30,31)==1, and that is the same as mod(i**5)**6, so there are six other modulo roots that come into play. That’s 7 nonzero values out of 31 possible. I thought if screening for mod 11 residuals worked so well, why not try mod 31 residuals. So I added those to the code, checked to make sure that they do indeed result in additional inner loop exclusions, and did the timings. What I see is that this extra test slows down the code.

i^5 =  61917364224
133 110 84 27 144

real    0m0.027s
user    0m0.025s
sys     0m0.002s

This simple array lookup and conditional branch somehow adds 10% to the run times, basically undoing the small improvement I made by doing the table lookup. I do not understand why that happens, but those are the numbers. Here is the binary search version of the code with the mod 31 test. You can comment out that mod 31 test statement to see if it has the same effect on your hardware+compiler combinations.

program sum_conjecture
   ! sum_conjecture.f90
   !
   ! 27^5 + 84^5 + 110^5 + 133^5 = 144^5
   !
   use, intrinsic :: iso_fortran_env, only: ik => int64, logical_kinds
   implicit none
   integer, parameter :: nmax = 6208    ! mmax integer range to search.
   integer, parameter :: findmax = 1    ! max number of solutions to find.
   integer, parameter :: lk = selected_logical_kind(1)  ! f2023 and later.
   !integer, parameter :: lk = 1  ! logical kind with the smallest storage_size() value.
   integer(ik) :: i5, sk, sl, sm, diff
   integer(ik) :: n5(nmax)
   integer :: i, j, k, l, m, mapT11, mapT31, nbits, mmin, mmax, nfound

   logical(lk), parameter :: t=.true., f=.false.
   integer, parameter :: map_11(0:10) = [1,2,0,0,0,0,0,0,0,0,3]  ! map T11={0,1,10} to a contiguous domain.
   logical(lk), parameter :: exclude_k11(0:10,1:3) = reshape([ &
        & f,f,f,t,t,t,t,t,t,f,f, & ! T11 = 0
        & f,f,f,t,t,t,t,t,t,t,f, & ! T11 = 1
        & f,f,t,t,t,t,t,t,t,f,f],& ! T11 = 10
        & shape(exclude_k11) )
   logical(lk), parameter :: exclude_l11(0:10,1:3) = reshape([ &
        & f,f,t,t,t,t,t,t,t,t,f, & ! T11 = 0
        & f,f,f,t,t,t,t,t,t,t,t, & ! T11 = 1
        & f,t,t,t,t,t,t,t,t,f,f],& ! T11 = 10
        & shape(exclude_l11) )
   integer, parameter :: map_31(0:30) = &
        & [1,2,0,0,0,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,6,0,0,0,7] ! map T11={0,1,5,6,25,26,30} to a contiguous domain.
   logical(lk), parameter :: exclude_l31(0:30,1:7) = reshape([ &
        & f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,f, & ! T31 = 0
        & f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t, & ! T31 = 1
        & f,t,t,t,f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f, & ! T31 = 5
        & f,f,t,t,t,f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t, & ! T31 = 6
        & f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,t,f,f,f,t,t,t,f, & ! T31 = 25
        & f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,t,f,f,f,t,t,t, & ! T31 = 26
        & f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,f,f],& ! T31 = 30
        & shape(exclude_l31) )

   character(*), parameter :: cfmt = '(*(g0,1x))'
   intrinsic :: int, mod, bit_size, leadz, ibset

   nfound = 0
   outer: do i = 1, nmax
      i5 = int(i,ik)**5
      n5(i) = i5
      mapT11 = map_11(mod(i5,11_ik))   ! target residual; T11 = 0, 1, or 10.
      mapT31 = map_31(mod(i5,31_ik))   ! target residual; T31 = 0, 1, 5, 6, 25, 26, or 30.
      do j = 1, i
         do k = 1, j
            sk = n5(j) + n5(k)
            if ( sk > i5 ) exit
            if ( exclude_k11( mod(sk,11_ik), mapT11 ) ) cycle
            do l = 1, k
               sl = sk + n5(l)
               diff = i5 - sl       ! target m**5 value.
               if ( diff < 0 ) exit
               if ( exclude_l11( mod(sl,11_ik), mapT11 ) ) cycle  ! <--- comment this out.
               if ( exclude_l31( mod(sl,31_ik), mapT31 ) ) cycle  ! <--- comment this out.
               if ( n5(l) < diff ) cycle  ! no possible solution  in (1...l).

               ! set up for binary search for m**5 == diff.
               nbits = bit_size(diff) - leadz(diff) 
               mmin = ibset(0,((nbits-1)/5))            ! lower bound to m.
               mmax = min( l, ibset(0,((nbits+4)/5)))   ! upper bound to m.
               !mmin = 1; mmax = l
               if ( n5(mmin) == diff ) then  ! check for shortcuts.
                  mmax = mmin
               elseif ( n5(mmax) == diff ) then
                  mmin = mmax
               endif
               do
                  m = (mmin + mmax) / 2
                  if ( n5(m) > diff ) then
                     mmax = m
                  elseif ( n5(m) < diff ) then
                     if ( m == mmin ) exit  ! no solution.
                     mmin = m
                  else  ! solution found.
                     print cfmt, "i^5 = ", i5
                     print cfmt, j, k, l, m, i
                     nfound = nfound + 1
                     if ( nfound >= findmax ) then
                        exit outer
                     else
                        exit   ! continue searching.
                     endif
                  endif
               enddo

            enddo
         enddo
      enddo
   enddo outer

end program sum_conjecture

Any ideas why that slowdown occurs would be greatly appreciated. It doesn’t make sense to me.

Just to see what happens, I commented out both of the l loop mod checks, and here are the timings.

i^5 =  61917364224
133 110 84 27 144

real    0m0.016s
user    0m0.014s
sys     0m0.002s

Those are the best times I’ve seen so far, and I don’t understand why. I went further and commented out also the k loop mod tests, and the code slows down. At least that makes sense. But I don’t understand why branching around the inner loop (which is the binary search in this version) slows things down. Anyway, that is the bad news.

The next thing to consider is using the mod 31 values in a k loop test. I’m not sure if that is worth the effort because almost all of the generated values are allowed. at that level.

I also added the line

integer, parameter :: findmax = 1    ! max number of solutions to find.

to this version to allow searching for multiple solutions. You can set it to a small value, like 5 or 10, to see how the code works for longer searches. Don’t set it too high though or you will have to wait overnight.

1 Like

You have probably thought of another relation that may be useful, but just in case:

For any integer i, i and i**5 are equal modulo 10.

Here is a link to some scanned CDC manuals. Index of /pdf/cdc/Tom_Hunter_Scans

On the Fortran compiler front a number of compilers were available. There were the ones from CDC plus a number of other 3rd party products.

If I remember correctly the University of Waterloo provided a compiler.

There were also additional add ons for the CDC Fortran compiler. In the UK one was called mantrap (done by people at Manchester) and the Leicester Trace package - LTP . There was an option at compile time to enable mantrap/ltp. In the event of a program abort this package would use a map file to analyse memory and provide a subroutine and function call trace back printing out the values of all scalar variables and subsets of the array variables. There were special bit patterns for unitialised variables and the trace back enabled most users to resolve their programming issues themselves (I worked on the help desk at Imperial College Computer Centre at the time).

Looking at my Compass (the CDC assembler) manuals integers were 48 bits.

Ian Chivers

Between Al Kossow (who is responsible for bitsavers), Tom Hunter, and a number of others of us on the “controlfreaks” email list, over the years we’ve managed to preserve quite a bit of CDC documentation and software. We can run 60-bit operating systems from the very first Chippewa Operating System to the final version of NOS 2.8. Various SCOPE, KRONOS, NOS/BE, and highly modified variants as well.

Waterloo did WATFOR and WATFIVE on IBM systems. The equivalent in the CDC world were MNF and M77 from the U of Minnesota. Also secondarily RUNT from U of Washington.

We had MANTRAP (Manchester Trace Package) installed in the U of Nevada KRONOS system - where I worked for a while. (UNEV had CDC systems at both the Reno and Las Vegas campuses.) CDC later made some sort of deal with Manchester to obtain MANTRAP and integrated it into their standard compiler offering on NOS and NOS/BE.

Actually, it has been over 55 years since I studied things like modulo rings, least common multiples, greatest common divisors, and so on. I remember some of it (I don’t know why) but much of it has evaporated by now.

I did notice this. Also, there are many cases where mod(i,n) and mod(i**5,n) are just permutations of each other. For n=10, that permutation happens to be the identity permutation. I don’t really know why that happens.

I was focusing on just the sparsity of the mod(i**5,n) values, because I thought that might be most useful for skipping over the inner do loops in that little Euler program. Here is a program I used to look at that sparsity.

program modn5
   use, intrinsic :: iso_fortran_env, only: ik => int64
   implicit none
   integer :: nmax
   character(*), parameter :: cfmt = '(*(g0,1x))'
   write(*,cfmt) 'enter nmax:'
   read(*,*) nmax
   write(*,cfmt) 'nmax=', nmax
   block
     integer(ik) :: i5
     integer :: n, i, k, kx, knt(0:nmax-1), i5mod(1:nmax), nonzero
     real :: ratio
     do n = 1, nmax
        knt(:n-1) = 0
        do i = 1, n
           i5 = int(i,ik)**5
           k = mod(i5,n)      ! brute force way.
           kx = mod(i*i,n)    ! safe way with no overflow.
           kx = mod(kx*kx,n)
           kx = mod(i*kx,n)
           if ( k .ne. kx) write(*,cfmt) 'kx error for i=', i, k, kx
           i5mod(i) = k
           knt(k) = knt(k) + 1
        enddo
        nonzero = count( knt(:n-1) .ne. 0 )
        ratio = real(nonzero) / n
        if ( ratio < 0.3 ) then
           write(*,cfmt) 'n=', n, 'nonzero=', nonzero, 'ratio=', ratio
           write(*,cfmt) 'i5mod(:n-1)=', i5mod(:n-1)
           write(*,cfmt) 'knt(:n-1)=', knt(:n-1)
        endif
     enddo
   end block
end program modn5

The mod(i**5,n) values look interesting for n={25,41,50,100} too. I was hoping to see a way to automatically use that information to compute several mod tests. That insight has so far eluded me. I did the n=11 and n=31 cases by hand in that last code I posted.

As a new FORTRAN user in the early 70’s, it was very much an unknown how different the available compilers were. My first course was based on McCracken’s book and the Watfor compiler on an IBM (it had “better” diagnostics). My second year was using CDC with a random chosen FORTRAN compiler. From memory CDC provided two compilers but the other was described as less friendly. The one I used still had horrendus error messages, so you could imagine how bad the other was ? Debugging code took days/weeks.

We were given no advice on the existence of a Fortran standard, but use McCracken.
From there it was trial and error as to what the compiler would accept and produce.

Given how small the available memory was, it’s importance was not emphasised (for uni projects). I think we could ask for 200K words on the CDC for overnight, but 100K during the day.

My experience was somewhat different. In the beginning, the CDC 6000 had three Fortran compilers, RUN, FUN and FTN. RUN compiled your code fast and the runs were equally short if the program did some simple calculations. FUN took longer to compile but the subsequent runs were faster than with RUN. FTN was the slowest compiler, but the code that it generated was optimized and ran fast. A couple of years later, MNF became available on a slower 6400 that provided Teletype input and output, whereas the 6600 provided punched card input and line printer output.

Debugging on the 6600, once your code compiled and ran, was fairly simple. The two friendliest numbers that I got acquainted with were O3777777777777777777 (infinity) and O1777777777777777777 (undefined). If you saw one of these numbers in X6 or X7 in the register dump, the traceback dump that followed helped you to find the source line with the error. After a few weeks of experience, it was fairly easy to fix the source deck and run again. Of course, errors in the algorithm implementation or a poorly chosen algorithm would take longer to rectify.

This site has entertaining anecdotes from CDC 6000 and Cyber computer installations:

There are several cases where this identity permutation happens, for n={2,3,5,6,10,15,30}. It does not happen for n=31 or 32, and it cannot happen for any larger value because mod(2**5,n)==32 in all those cases. I was thinking how to use that information, and it occurred to me that if you know ahead of time what mod(m**5,n) is for the target value, then you could start the m loop index at that value and increment by n each step. You know ahead of time that nothing in between can possibly match. So if you take n=30, the largest value, you could go through the m loop 30 times faster than normal.

Then it occurred to me that you don’t have to settle for just the identity permutation cases. You can take any n value for any of the other permutation cases. That little modn5.f90 code I posted earlier shows that many values of n could be used, even values over 1000. Those values would let you go through the m loop n times faster. If n is larger than l, then it allows you to test that single value and then exit the loop. Maybe this is cheating a little, but we know the smallest solution has i=144, so if we pick a value larger than 144, we are guaranteed to find that solution with a single pass through the m loop every time. This eliminates the binary search problem entirely. It just so happens that n=145, only one value beyond 144, results in a permutation mapping between mod(i,n) and mod(i**5,n). So the only problem left is to compute that permutation array ahead of time, and then proceed with the brute force search over j, k, and l as before. When I do that, I get these timings.

$ gfortran -O3 sum_conjecture3.F90
$ time a.out
i5 = 61917364224
133 110 84 27 144

real    0m0.016s
user    0m0.014s
sys     0m0.001s

These are Apple M2 times. Those are basically as good as any previous results, but the algorithm is aesthetically cleaner than, say the binary search versions. I also experimented a little with some of the other values, and they all result in good timings, so the general idea is sound. I also set findmax to values larger than 1, and the code continues to perform well for the larger solutions too. Here is the code.

program sum_conjecture
   ! sum_conjecture.f90
   !
   ! 27^5 + 84^5 + 110^5 + 133^5 = 144^5
   !
   use, intrinsic :: iso_fortran_env, only: ik => int64
   implicit none
   integer, parameter :: nmax = 6208    ! mmax integer range to search.
   integer, parameter :: findmax = 1    ! max number of solutions to find.
   integer, parameter :: lk = selected_logical_kind(1)  ! f2023 and later.
   !integer, parameter :: lk = 1  ! logical kind with the smallest storage_size() value.
   integer(ik) :: i5, sk, sl, diff
   integer(ik) :: n5(nmax)

   ! nperm is not arbitrary, but there are many values that work.
   ! 10, 30, 42, 65, 87, 105, 130, 145, 249, 510, 709, 995 are some acceptable values.
   integer, parameter :: nperm = 145
   integer :: perm(0:nperm-1)  ! integer m permutation array.

   integer :: i, j, k, l, m, mapT11, mapT31, mmin, nfound

   logical(lk), parameter :: t=.true., f=.false.
   integer, parameter :: map_11(0:10) = [1,2,0,0,0,0,0,0,0,0,3]  ! map T11={0,1,10} to a contiguous domain.
   logical(lk), parameter :: exclude_k11(0:10,1:3) = reshape([ &
        & f,f,f,t,t,t,t,t,t,f,f, & ! T11 = 0
        & f,f,f,t,t,t,t,t,t,t,f, & ! T11 = 1
        & f,f,t,t,t,t,t,t,t,f,f],& ! T11 = 10
        & shape(exclude_k11) )
   logical(lk), parameter :: exclude_l11(0:10,1:3) = reshape([ &
        & f,f,t,t,t,t,t,t,t,t,f, & ! T11 = 0
        & f,f,f,t,t,t,t,t,t,t,t, & ! T11 = 1
        & f,t,t,t,t,t,t,t,t,f,f],& ! T11 = 10
        & shape(exclude_l11) )
   integer, parameter :: map_31(0:30) = &
        & [1,2,0,0,0,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,6,0,0,0,7] ! map T11={0,1,5,6,25,26,30} to a contiguous domain.
   logical(lk), parameter :: exclude_l31(0:30,1:7) = reshape([ &
        & f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,f, & ! T31 = 0
        & f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t, & ! T31 = 1
        & f,t,t,t,f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f, & ! T31 = 5
        & f,f,t,t,t,f,f,f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t, & ! T31 = 6
        & f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,t,f,f,f,t,t,t,f, & ! T31 = 25
        & f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,t,f,f,f,t,t,t, & ! T31 = 26
        & f,t,t,t,f,f,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,t,f,f,t,t,t,f,f],& ! T31 = 30
        & shape(exclude_l31) )

   character(*), parameter :: cfmt = '(*(g0,1x))'
   intrinsic :: int, mod

   ! set up the perm(:) permutation array.
   perm = 0
   do m = 1, nperm
      k = mod(m*m,nperm)  ! m*2
      k = mod(k*k,nperm)  ! m**4
      k = mod(m*k,nperm)  ! m**5
      perm(k) = m         ! perm(k) gives the lowest index m to match a given m**5 value.
   enddo
   !if ( any(perm==0) ) then
   !   print cfmt, 'invalid nperm results in perm(k)==0, nperm=', nperm
   !   error stop 1
   !endif
   
   nfound = 0
   outer: do i = 1, nmax
      i5 = int(i,ik)**5
      n5(i) = i5
      mapT11 = map_11(mod(i5,11_ik))   ! target modulus; T11 = 0, 1, or 10.
      mapT31 = map_31(mod(i5,31_ik))   ! target modulus; T31 = 0, 1, 5, 6, 25, 26, or 30.
      do j = 1, i
         do k = 1, j
            sk = n5(j) + n5(k)
            if ( sk > i5 ) exit
            if ( exclude_k11( mod(sk,11_ik), mapT11 ) ) cycle
            do l = 1, k
               sl = sk + n5(l)
               diff = i5 - sl       ! target m**5 value.
               if ( diff < 0 ) exit
               if ( n5(l) < diff ) cycle  ! no possible solution  in (1...l).
               !if ( exclude_l11( mod(sl,11_ik), mapT11 ) ) cycle
               !if ( exclude_l31( mod(sl,31_ik), mapT31 ) ) cycle

               mmin = perm(mod(diff,nperm))  ! mod(m**5,nperm) == mod(diff,nperm) is required,
               do m = mmin, l, nperm         ! mmin is the smallest such value.
                  if ( n5(m) < diff ) then
                     cycle
                  elseif ( n5(m) > diff ) then
                     exit
                  else    ! n5(m) == diff
                     print cfmt, "i5 =", i5
                     print cfmt, j, k, l, m, i
                     nfound = nfound + 1
                     if ( nfound >= findmax ) then
                        exit outer
                     else
                        exit  ! continue searching.
                     endif
                  endif
               enddo

            enddo
         enddo
      enddo
   enddo outer

end program sum_conjecture

I left a test in the code to verify that the permutation array is valid. If you change the value of nperm, then you should probably uncomment that if block at least once to make sure you have a good nperm value. I left it commented out just because it might affect the times.

You might also notice that I commented out the two l loop exclude tests. Uncommenting those tests increases the run time a little. I still don’t understand why that happens. And it may not happen for other cpu+compiler combinations.

I think the next thing to try to improve the code would be to exclude more of the outer k loop passes.

1 Like

Indeed,this is not only the fastest code so far, but your innovations indicate that it may be now feasible to go beyond duplicating the 1966-1967 work of Lander, Parkin and Selfridge, and to tackle the Lander, Parkin, Selfridge Conjecture:

If sum(a_i^k, i=1..n) = sum( b_j^k,j=1..m), with a_i, b_j being positive integers , then m+n >= k

It seems now customary to call this the (k,m,n) problem. Euler’s Conjecture pertains to the particular case of this with m=1 .

One solution to the (6,3,3) problem, published in 1934, is:

 3^6 + 19^6 + 22^6 = 10^6 + 15^6 + 23^6 

There is an active project to tackle the general problem, Computing Minimal Equal Sums Of Like Powers. There, you can see a known case for which direct calculation with integer*8 variables will not suffice:

141325+2205 = 140685+62375+50275

In your code, if a user changes the value of nperm, would it necessary to recompute the initialization values for the logical arrays map_xyz and exclude_xyz? If so, do you obtain the values by running another program as a preliminary step? Thanks.

This page, with names changed to protect the innocent, almost exactly describes what happened at the U of Nevada - story by story. They were also a XDS Sigma site that moved to CDC in the 1970s. UNEV went through the same conversion process and made many of the same mistakes. I heard lots of stories from the old-timers… Instead of having two co-located CDC machines, we had one in Reno and the other in Las Vegas. They were tied together by a 4800 baud dedicated synchronous phone line, and some custom mods (several thousand lines of PP and CP assembly language) which CDC wrote for the “Export-Import” subsystem (EI200) to allow batch jobs to be submitted from one system to the other.

When we finally upgraded from KRONOS to NOS (a story in itself), I got the envious task of ‘porting’ the EI200 mods. Of course CDC had made many internal changes in the OS between the final version of KRONOS and the version of NOS we jumped to. So I had to re-write a lot of it. I was able to take advantage of some newer internal hooks - as CDC was finally starting to realize that computers needed to talk to each other.

Shortly before I left, CDC introduced a special product to interface between NOS and IBM systems running the NJE protocol. Then extended it to go between two NOS systems. We quickly obtained it. It didn’t support everything we needed, so for a while I had both the old EI200 link running in parallel with the NJE link. But eventually we figured out mods to get NJE to do everything. After I left, UNEV used the NJE software to connect to BITNET.

We also had a networked Gandalf PACX setup to let interactive users log into either machine.

Networking life before TCP/IP and the internet…

No, those are independent tests. The mod 11 and mod 31 tests exclude certain values in the k and l outer loops, while the perm(:) array makes the innermost m loop as efficient as possible once it is executed.

I did those mod 11 and mod 31 values by hand. By doing that, I thought I might be able to understand the process enough to write a code to generate them automatically, but I have not done that yet. I still have a few ideas to try, so maybe one of those will work out.

I’m also still trying to understand why adding the l loop tests slows down the code. By testing the conditionals, I can verify that it skips over the inner m loop for l values that would otherwise be active, so that part works. And each test in this last version is just an array lookup, so underneath it just requires a couple of index calculations. Maybe the m loop is now just so fast that even a 2D table lookup is expensive by comparison? I thought maybe it was some kind of cache effect, that referencing those 2D tables exceeded some memory limit, so I changed the lk kind value so that it uses 1-byte logicals rather than the default 4-byte logicals, but that had little/no effect, so I don’t think that is the culprit.

It always amazes me that people were able to muster enough patience in the era before computers existed to do such calculations. Or is there some clever trick to find these solutions? The individual numbers in this case are not that large perhaps, but still …

According to the Wikipedia article on him, Euler knew the first hundred prime numbers and could give each of their powers up to the sixth degree . Early in his life, Euler memorized Virgil’s Aeneid, and by old age, he could recite the poem and give the first and last sentence on each page of the edition from which he had learnt it ( 12 books and approximately 9,896 lines).
He married twice, and fathered 13 children.

Gosh, I can only recite the first few lines of Aeneid, consequence of a classic education :blush: .