Large wall time, small CPU_TIME, what does that tell you?

Dear all,

I have a code (edited, it is mix of Fortran and C++, some fortran subroutines are wrappers to C++ functions). No MPI, no OpenMP. Completely single core serial code. Now if I use CPU_TIME to measure how long the code takes, it shows 30 seconds. But I do feel it takes much more than 30 seconds. So I place the wall time counter in the code, and what I got is, wall time is 200 seconds. Like below

call CPU_TIME(start)
tic = wtime() ! wall time counter
...
... 
...
toc = wtime() ! wall time counter
call CPU_TIME(end)
write(6,*) 'cpu time is: ', end-start
write(6,*) 'wall time is: ', toc-tic

So the question is, in general, based on your experience, in such a code, if CPU_TIME is small like 30 seconds, and the real wall time is large like 200 seconds. What does that tell you about the code? Thank you in advance!
Note that the code just runs, no interactive stuff. Yes it will write some thing to the NVME SSD drive.

PS.
The wall time clock I use is:

  function wtime ()
!	from  John Burkardt
  implicit none
  integer(kind=i8) clock_max
  integer(kind=i8) clock_rate
  integer(kind=i8) clock_reading
  real(kind=r8) wtime
  call system_clock ( clock_reading, clock_rate, clock_max )
  wtime = real ( clock_reading, kind = r8 ) &
        / real ( clock_rate, kind = r8 )
  return
  end function wtime   

Many possibilities The simplest is that your system is under load from other applications and that your application is only getting access to your processor part of the time it is executing.

Your application could be I/O bound or accessing an external device that is not respondling because of load, or network bandwidth being saturated.

Your could could be spending a large amount of time allocating and deallocating memory or any system resource.

The code can be intentionally sleeping waiting for an event such as a file appearing or a specified amount of time passing.

You might be collecting resource usage so frequently in an inner loop of the code that you are slowing the code down significantly

The program could be spawining a large amount of sub-processes calling external commands, which can be quite time consuming. not all CPU_TIME procedures will report the CPU time used by a subprocess either.

These are some of the simpler possibilities. If on Linux you probably want to start by
running your application with strace(1), which will identify excessive system calls or
calls to sleep

Many of these common problems will be identified by strace.

Is there anything unusual about how you are launching the program, such as using a job scheduler such as Slurm? Is there anything else going on on the machine that would put a heavy load on it or make it do excessive amounts of context switching? Listing all the possibilities is a daunting task. You could have hardware problems clocking back your CPU, and so on … so the best place to start is learning a few techniques for narrowing down the cause.

3 Likes

Once you have eliminated those problems the next step would be to determine where in the code the majority of time is being spent. Vendor-specific profiling tools often are preferable but learning how to do a basic profiling of your Fortran codes with tools such as gprof(1) is probably the second step to take.

And you should learn what compiler optimizations switches are available.

If you get serious enough about it your journey will probably not end until you are at the point you are dumping the machine code while you browse the hardware manual for your machine :>

1 Like

Many thanks!

The code is just serial computation code, for example, using BFGS to minimize a function and find the optimal parameters. Just a exe code. I am running it in Visual Studio, and press ctrl+F5 and it runs. There are not huge or very frequent I/O it seems. No sleep stuff. No waiting file to be written stuff. It is just running on a laptop, not under heavy load.

But the code is mixed with Fortran and C++. Some subroutines are just wrappers of C++ functions. So when call these subroutines, it is actually the C++ function are running. Also the the CPU_TIME is placed in the beginning and the end of the Fortran code.

From what you said,

Blockquote
not all CPU_TIME procedures will report the CPU time used by a subprocess either.

Is it possible that, some C++ subroutines running time are not counted in the Fortran’s CPU_TIME? Do you think that could be possible? Like, in the Fortran code, the CPU_TIME only measure the time that the Fortran code spent on the CPU?Sorry that is a very lazy question, I can just check myself. But I asked anyway :grinning_face:

Unlikely. Whether the upper-level program is C, C++, Fortran … it is very likely the system procedure getrusage(3) is being called and it should account for time in any of those languages but many details are implementation-dependent as with regards to CPU_TIME. Check the documentation for CPU_TIME for the programming environment you are using to see if it provides details about its limitations.

1 Like

Thank you!
Just curious, do you think CPU_TIME is useful to time a code?

I mean, if the code is a ‘pure’ code, no any I/O at all. The code just run using CPU and RAM, and finally output a result. Even in such a pure case, if I were to time a code, and see if my change really improve its speed, I think I will still just use wall time, instead of CPU_TIME. Because, as you mentioned, it is possible that some process’s time may not be recorded in CPU_TIME.

Just curious, in what scenario, do you think the CPU_TIME would be useful?
I remember when I first learn Fortran when I was undergraduate, when I was timing a code, I use CPU_TIME. But as soon as I learn the idea of wall time, I no longer use CPU_TIME to time code. CPU_TIME does not seem to be very useful :grinning_face:

1 Like

At a high level using omething like the Unix time command can be all that is needed. It returns system and user time separately as well as wall clock time; and on some systems (mileage may vary) git also gives I/O usage and memory high-water marks; on Linux machines the program can
just output that file /proc/$$/stats. Just what CPU_TIME does varies a lot from processor to processor so I would use such tools instead if I just want a quick idea of cpu utilization is like.

But if developing your code something like CPU_TIME can be very useful in letting you measure effects of diferent solution methods or sampling performance periodically in your code. You can glean a lot of information form a timing value if it outputs the time taken in each iteration of a computation or in a certain region of code.

CPU_TIME Is often so coarse a measure that it is not a good tool for trying to measure small durations of code so if you just need a coarse measure of a relatively long execution or want to get multiple measures from a single execution it is a nice convenient solution that is very portable. For other needs I would go elsewhere.

1 Like

Sometimes the user wants to optimize cpu time, other situations the user wants to minimize wall time. The two are not always the same.

For example, suppose you are sharing a computer with many other users and each user pays for their share of the machine based on cpu time. When one user job is swapped out, then it accumulates wall time but it does not accumulate cpu time, so that is fair. In that case, one would want to optimize cpu time, to minimize his costs, and wall time is largely out of his direct control. When the user is using a lightly loaded machine, then his wall time will decrease but his cpu time will remain constant; when heavily loaded, his wall time will increase, but cpu time will remain constant.

1 Like

I was expecting that there could be memory access delays in a single thread calculation. These can typically occur with large arrays that are not accessed sequentially,

However, the folowing example, while showing the delay for poor memory access, does not show the processor idle state that the OP describes.

   real*8, allocatable :: large_array(:,:,:)
   integer :: i,j,k, ni,nj,nk
   real    :: tic,toc, start, end

   ni = 1000
   nj = 2000
   nk = 3000
   allocate ( large_array(ni,nj,nk) )

   write (*,*) 'GOOD access'
   call CPU_TIME(start)
   tic = wall_time () ! wall time counter
   
   do k = 1,nk
     do j = 1,nj
       do i = 1,ni
         large_array(i,j,k) = i+j+k
       end do
     end do
   end do

   toc = wall_time() ! wall time counter
   call CPU_TIME(end)

   write(*,*) 'cpu time is: ', end-start
   write(*,*) 'wall time is: ', toc-tic

   write (*,*) 'BAD  access'
   call CPU_TIME(start)
   tic = wall_time () ! wall time counter
   
   do i = 1,ni
     do j = 1,nj
       do k = 1,nk
         large_array(i,j,k) = i+j+k
       end do
     end do
   end do

   toc = wall_time() ! wall time counter
   call CPU_TIME(end)

   write(*,*) 'cpu time is: ', end-start
   write(*,*) 'wall time is: ', toc-tic

   contains

   real function wall_time ()
    integer*8 :: ticks, rate
    call system_clock ( ticks, rate )
    wall_time = dble(ticks)/dble(rate)
   end function wall_time

   end

With multi-thread and a memory access bottleneck, I would expect the processor to show idle delays.

Could large I/O delays cause the processor wait times ?

3 Likes

Thank you!
The dimension is a little too big for my laptop, haha, so I changed ni, nj and nk to 500,1000,1500. Intel OneAPI on Visual Studio 2019. Release mode, heap-array is on. What I got is:


Is it normal?
It is interesting, it shows wall time is 0 seconds. On a single thread machine, should wall time always be no less than CPU_TIME?
Also, I ran it several times, the bad access which does not respect column major reports lower CPU_TIME :grinning_face:.

Another question,
for a serial code, just run on a single thread, no I/O no sleep or interactive stuff at all, you know just a pure computation code. Can I say, in such a case, ideally, CPU_TIME should be almost identical with Wall time? Thanks!

Maybe try double precision for timer variables? (Then the code gives reasonable results on my mac.) BTW, flang seems to give “wall_time” in msec, while gfortran gives it in seconds.

real*8  :: tic,toc, start, end
...
real*8 function wall_time ()
...

(I also wonder if it might possibly cause other problems to assign an integer*8 to real or real*8 variables (and then take the difference of the latter)…?)

It may be also useful to set some value for the allocated array before measuring time, like:

   ni = 1000
   nj = 2000
   nk = 30
   allocate ( large_array(ni,nj,nk), source=0.0d0 )
1 Like

No Idea why wall_time is zero ??
Try integer :: ticks, rate
or write out ticks and rate at each call.
You must not be using Gfortran on Win OS

The other significant reason for wall clock delay is if virtual memory is being activated, ie array is larger than available physical memory. I use task manager to identify this. (on Windows OS)

1 Like

Great example. Depending on the platform and compiler options and the
array size you can get all kinds of CPU_TIME/WALLCLOCK_TIME ratios.

Using fpm and being on a platform with the time command I could
reproduce the OPs numbers easily; and by changing the array size could
get all kinds of other interesting ratios; some of which were really
surprising when on a MicroSoft OS. Far less surprises on Linux for me.

I added a few command line options. Since the OP mentioned calling C/C++
it would be easy to get a bad stride because of the differences in F
and C storage order as part of the explanation or the timings being seen.

Your example is a keeper for demonstrating several performance issues. I
already added it to my “hello” command; which generates a directory of
examples for users (it originally created “hello world!” examples of
MPI, OpenMP, PVM, .. batch jobs using PBS/LSF/Slurm/TORQUE directives –
hence the name; but generates a lot of other examples of good and bad
practice and module use now as well).

#!/bin/bash
(
exec 2>&1
fpm  --profile=release  --runner='/bin/time  -v'  run  --  ratio=3  good=T verbose=T
fpm  --profile=release  --runner='/bin/time  -v'  run  --  ratio=3  good=F verbose=T
fpm  --profile=debug    --runner='/bin/time  -v'  run  --  ratio=3  good=T verbose=T
fpm  --profile=debug    --runner='/bin/time  -v'  run  --  ratio=3  good=F verbose=T
)|tee log.txt
code with command line options
program slowmotion
use, intrinsic::iso_fortran_env, only: int8, int16, int32, int64
use, intrinsic::iso_fortran_env, only: sp=>real32, dp=>real64
real(kind=dp), allocatable    :: large_array(:, :, :)
integer                       :: i, j, k, ni, nj, nk
real                          :: tic, toc, start, end
! command line
real                          :: ratio = 1.0;       namelist /cmd/ ratio
logical                       :: good = .true.;     namelist /cmd/ good
logical                       :: verbose = .false.; namelist /cmd/ verbose
character(len=:), allocatable :: string
character(len=255)            :: iomsg
integer                       :: iostat

   string = get_namelist()  ! return command line arguments as NAMELIST input
   read (string, nml=cmd, iostat=iostat, iomsg=iomsg) ! internal read of namelist
   if (iostat .ne. 0) then
      write (*, '("<ERROR>",i0,1x,a)') iostat, trim(iomsg)
      write (*, *) 'COMMAND OPTIONS ARE'
      write (*, nml=cmd)
      stop 1
   endif

   if(verbose)call platform()

   ni = nint(1000/ratio)
   nj = nint(2000/ratio)
   nk = nint(3000/ratio)

   allocate (large_array(ni, nj, nk))

   if (good) then
      write (*, *) 'GOOD access'
      call CPU_TIME(start)
      tic = wall_time() ! wall time counter

      do k = 1, nk
         do j = 1, nj
            do i = 1, ni
               large_array(i, j, k) = i + j + k
            end do
         end do
      end do

      toc = wall_time() ! wall time counter
      call CPU_TIME(end)

      write (*, *) 'cpu time is: ', end - start
      write (*, *) 'wall time is: ', toc - tic
   else

      write (*, *) 'BAD  access'
      call CPU_TIME(start)
      tic = wall_time() ! wall time counter

      do i = 1, ni
         do j = 1, nj
            do k = 1, nk
               large_array(i, j, k) = i + j + k
            end do
         end do
      end do

      toc = wall_time() ! wall time counter
      call CPU_TIME(end)

      write (*, *) 'cpu time is: ', end - start
      write (*, *) 'wall time is: ', toc - tic
   endif

contains

real function wall_time()
integer(kind=int64) :: ticks, rate
   call system_clock(ticks, rate)
   wall_time = dble(ticks)/dble(rate)
end function wall_time

function get_namelist() result(string)
character(len=:), allocatable :: string
integer :: command_line_length
   call get_command(length=command_line_length)
   allocate (character(len=command_line_length) :: string)
   call get_command(string)
   string = adjustl(string)//' '
   string = string(index(string, ' '):)
   string = "&cmd "//string//" /"
end function get_namelist

subroutine platform()
use, intrinsic :: iso_fortran_env, only : compiler_version
use, intrinsic :: iso_fortran_env, only : compiler_options
implicit none
character(len=:),allocatable :: version, options
character(len=*),parameter   :: nl=new_line('a')
integer                      :: where, start, break, i, last, col
   version=compiler_version()//' '
   options=' '//compiler_options()
   start=1
   do 
      where=index(options(start:),' -')
      if(where.eq.0)exit
      break=where+start-1
      options(break:break)=nl
      start=where
   enddo
   if(start.eq.1)then
      do 
         where=index(options(start:),' /')
         if(where.eq.0)exit
         break=where+start-1
         options(break:break)=nl
         start=where
      enddo
   endif
   last=len_trim(version)+1
   col=0
   do i=1,len_trim(version)
    col=col+1
    if(version(i:i).eq.' ')last=i
    if(col.gt.76)then
       version(last:last)=nl
       col=0
    endif
   enddo
   print '(a,/,3x,*(a))', 'This file was compiled by :', inset(version)
   if(options.ne.'')then
      print '(*(a))', 'using the options :', inset(options)
   endif
end subroutine platform

function inset(string) result(longer)
character(len=*),intent(in)  :: string
character(len=:),allocatable :: longer
character(len=*),parameter   :: nl=new_line('a')
integer                      :: i
   longer=''
   do i=1,len(string)
      longer=longer//string(i:i)
      if(string(i:i).eq.nl)then
         longer=longer//'   '
      endif
   enddo
end function inset

end program slowmotion

What values are valid varies a lot from platform to platform when using the time command on Unix-like systems but there is at least some data in there.
that is always useful and you can use it on any command but /bin/time -v seems to be often overlooked, so perhaps the rather verbose output will be of interest to some. I did not add it but on Linux adding a routine at the end of your program to read the /proc/$$/status file and echo it back out gives all kinds of useful info without having to call any C at all. Not sure what the equivalent is on other OSes.

Output
This file was compiled by :
   GCC version 16.0.0 20250727 (experimental) 
using the options :
   -I build/gfortran_2654F75F5833692A
   -mtune=generic
   -march=x86-64
   -O3
   -Wimplicit-interface
   -Werror=implicit-interface
   -funroll-loops
   -fPIC
   -fmax-errors=1
   -fcoarray=single
   -fimplicit-none
   -ffree-form
   -J build/gfortran_2654F75F5833692A
 GOOD access
 cpu time is:    1.96800005    
 wall time is:    10.2500000    
	Command being timed: "build/gfortran_688A5DB7BAD2F9DA/app/timeit ratio=3 good=T"
	User time (seconds): 0.20
	System time (seconds): 1.81
	Percent of CPU this job got: 19%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.39
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 384784
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 437078
	Minor (reclaiming a frame) page faults: 0
	Voluntary context switches: 0
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 65536
	Exit status: 0
Project is up to date
This file was compiled by :
   GCC version 16.0.0 20250727 (experimental) 
using the options :
   -I build/gfortran_2654F75F5833692A
   -mtune=generic
   -march=x86-64
   -O3
   -Wimplicit-interface
   -Werror=implicit-interface
   -funroll-loops
   -fPIC
   -fmax-errors=1
   -fcoarray=single
   -fimplicit-none
   -ffree-form
   -J build/gfortran_2654F75F5833692A
 BAD  access
 cpu time is:    17.2659988    
 wall time is:    51.1250000    
	Command being timed: "build/gfortran_688A5DB7BAD2F9DA/app/timeit ratio=3 good=F"
	User time (seconds): 8.28
	System time (seconds): 9.01
	Percent of CPU this job got: 33%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:51.23
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1738816
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1351881
	Minor (reclaiming a frame) page faults: 0
	Voluntary context switches: 0
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 65536
	Exit status: 0
Project is up to date
This file was compiled by :
   GCC version 16.0.0 20250727 (experimental) 
using the options :
   -I build/gfortran_87E2AE0597D39913
   -mtune=generic
   -march=x86-64
   -g
   -Wall
   -Wextra
   -Werror=implicit-interface
   -fPIC
   -fmax-errors=1
   -fbounds-check
   -fcheck=array-temps
   -fbacktrace
   -fcoarray=single
   -fimplicit-none
   -ffree-form
   -J build/gfortran_87E2AE0597D39913
 GOOD access
 cpu time is:    4.04600000    
 wall time is:    7.50000000    
	Command being timed: "build/gfortran_E167FD2A985B468F/app/timeit ratio=3 good=T"
	User time (seconds): 2.89
	System time (seconds): 1.18
	Percent of CPU this job got: 50%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.02
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1703704
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 436898
	Minor (reclaiming a frame) page faults: 0
	Voluntary context switches: 0
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 65536
	Exit status: 0
Project is up to date
This file was compiled by :
   GCC version 16.0.0 20250727 (experimental) 
using the options :
   -I build/gfortran_87E2AE0597D39913
   -mtune=generic
   -march=x86-64
   -g
   -Wall
   -Wextra
   -Werror=implicit-interface
   -fPIC
   -fmax-errors=1
   -fbounds-check
   -fcheck=array-temps
   -fbacktrace
   -fcoarray=single
   -fimplicit-none
   -ffree-form
   -J build/gfortran_87E2AE0597D39913
 BAD  access
 cpu time is:    11.1709995    
 wall time is:    25.8750000    
	Command being timed: "build/gfortran_E167FD2A985B468F/app/timeit ratio=3 good=F"
	User time (seconds): 7.15
	System time (seconds): 4.03
	Percent of CPU this job got: 43%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:25.98
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1738796
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 885418
	Minor (reclaiming a frame) page faults: 0
	Voluntary context switches: 0
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 65536
	Exit status: 0
1 Like

The zero seconds problem is because the int64 counters lose all precision when the real32 function result is computed. You can fix this by either using int32 counters or using real64 for the time values.

And speaking of that, code that uses real*8 and integer*8 declarations probably should not be posted here. We are supposed to be educating programmers how to use the language correctly, right?

As for the relation between the times, yes ideally cpu_time() <= wall time, but in practice the two results could use different hardware timers, and there could be rounding issues or sampling time issues that come into play, so some spurious timings might be observed in practice. Also, it might be difficult to eliminate all hyperthreading or multithreading from the code, particularly if external libraries are involved that have their own optimizations separate from any compiler options that might be used.

Here are the results of the code with real64 time values.

 GOOD access
 cpu time is:    1.4014019999999998     
 wall time is:    1.4013488292694092     
 BAD  access
 cpu time is:    7.3420809999999994     
 wall time is:    7.3418989181518555

Even here, the cpu times are slightly larger than the wall times. I don’t know why, but I would guess that some multithreading is occurring, and the cpu time evaluation adds the underlying thread times together.

1 Like

Thank you all!
I took the suggestion of @septc, and I used a wtime counter basically copied from John Burkardt, now I can output reasonable results for the code you shown on my laptop (windows 11, visual studio 2019 with IntelOneAPI).

   real*8, allocatable :: large_array(:,:,:)
   integer :: i,j,k, ni,nj,nk
   real*8    :: tic,toc, start, end

   ni = 500
   nj = 1000
   nk = 1500
   allocate ( large_array(ni,nj,nk) )

   write (*,*) 'GOOD access'
   call CPU_TIME(start)
   tic = wall_time () ! wall time counter
   
   do k = 1,nk
     do j = 1,nj
       do i = 1,ni
         large_array(i,j,k) = i+j+k
       end do
     end do
   end do

   toc = wall_time() ! wall time counter
   call CPU_TIME(end)

   write(*,*) 'cpu time is: ', end-start
   write(*,*) 'wall time is: ', toc-tic

   write (*,*) 'BAD  access'
   call CPU_TIME(start)
   tic = wall_time () ! wall time counter
   
   do i = 1,ni
     do j = 1,nj
       do k = 1,nk
         large_array(i,j,k) = i+j+k
       end do
     end do
   end do

   toc = wall_time() ! wall time counter
   call CPU_TIME(end)

   write(*,*) 'cpu time is: ', end-start
   write(*,*) 'wall time is: ', toc-tic

   contains

  function wall_time()
!	from  John Burkardt
  implicit none
  integer(8) clock_max
  integer(8) clock_rate
  integer(8) clock_reading
  real(8) wall_time
  call system_clock ( clock_reading, clock_rate, clock_max )
  wall_time = real ( clock_reading, kind = 8 ) &
        / real ( clock_rate, kind = 8 )
  return
  end function wall_time

  end  

Hmmm, the CPU_TIME is really tricky it seems. I have other people run the same code on other PCs, and their CPU_TIME and wall time are similar. Like, on my laptop (CPU is i7-1260p, hyper-thread enabled, efficiency cores disabled) wall time is 3 to 4 times the CPU_TIME. While on other PCs they got wall time about just 10% more than the CPU_TIME. Looks like it may be an isolated issue on my laptop.

I see, thanks. I know, things like real*8 and integer*8 may not be portable.
About real32, real64, int64, int32, you mean using something like below right?

use, intrinsic :: iso_fortran_env, only : real64, int64

Just curious, those real64, int64 stuff, are they portable? Like, if I use them in the code, and the code run on Windows and Linux, will it generate different results? Thanks!

Yes, those parameters and their use in declarations are a part of the standard language. The real*8 and integer*8 declarations are not, and never have been, part of the standard language. It isn’t that they aren’t portable, it is rather just that there are multiple ways to do things, so why not do them within the standard and without the need for extensions rather than the opposite.

As far as I know, the iso_fortran_env module is portable. That does not mean however that you will generate the same results when you change hardware, compiler versions, OS, or support libraries. Other issues, such as differences between rounding and truncation, support for gradual underflow, differences between fused multiply-add and separate instructions, and so on can still occur.

1 Like

Your zero wall_time result is very unusual. Could you test my modified wall_time and send me the results ?
I have not used a Fortran compiler that failed with integer*8 for 30 years.

   real function wall_time ()
    integer*8 :: ticks, rate
    call system_clock ( ticks, rate )
    wall_time = dble(ticks)/dble(rate)
    write (*,*) 'wall_time :',ticks, rate, wall_time
   end function wall_time
1 Like

Sure no problem. So I simply using your wall_time function and without change other piece of the code, below is what I got,

The code is:

   real*8, allocatable :: large_array(:,:,:)
   integer :: i,j,k, ni,nj,nk
   real*8    :: tic,toc, start, end

   ni = 500
   nj = 1000
   nk = 1500
   allocate ( large_array(ni,nj,nk) )

   write (*,*) 'GOOD access'
   call CPU_TIME(start)
   tic = wall_time () ! wall time counter
   
   do k = 1,nk
     do j = 1,nj
       do i = 1,ni
         large_array(i,j,k) = i+j+k
       end do
     end do
   end do

   toc = wall_time() ! wall time counter
   call CPU_TIME(end)

   write(*,*) 'cpu time is: ', end-start
   write(*,*) 'wall time is: ', toc-tic

   write (*,*) 'BAD  access'
   call CPU_TIME(start)
   tic = wall_time () ! wall time counter
   
   do i = 1,ni
     do j = 1,nj
       do k = 1,nk
         large_array(i,j,k) = i+j+k
       end do
     end do
   end do

   toc = wall_time() ! wall time counter
   call CPU_TIME(end)

   write(*,*) 'cpu time is: ', end-start
   write(*,*) 'wall time is: ', toc-tic

  contains

  
   real function wall_time () ! from JohnCampbell
    integer*8 :: ticks, rate
    call system_clock ( ticks, rate )
    wall_time = dble(ticks)/dble(rate)
    write (*,*) 'wall_time :',ticks, rate, wall_time
   end function wall_time  

The VS2019 solution file is attached also. Sorry it does not allow me upload the zip file, so I changed the extension name to csv. You may change it back to .zip, and unzip it. The name of the project is called DisplayFormatTest because I was too lazy to change it from a previous project name :laughing:
DisplayFormatTest.csv (183.1 KB)

Could you test this revised example, which removes overflow for real*4

   real function wall_time ()
    integer*8 :: ticks, rate, start_tick = -1
    call system_clock ( ticks, rate )
     if ( start_tick == -1) start_tick = ticks
     ticks = ticks - start_tick
    wall_time = dble(ticks)/dble(rate)
    write (*,*) 'wall_time :',ticks, rate, wall_time
   end function wall_time
1 Like