Proper Usage of `system_clock`

I thought I had figured out the correct usage for system_clock, but now nagfor is giving me the following message.

Questionable: example/expander/main.f90, line 23: Argument COUNT (no. 1) to intrinsic subroutine SYSTEM_CLOCK is of type INTEGER(int64), but argument COUNT_RATE (no. 2) is of type DOUBLE PRECISION

The standard has the following descriptions for the arguments.

COUNT (optional) shall be an integer scalar. It is an INTENT (OUT) argument. It is assigned a processor-dependent value based on the value of a processor clock, or −HUGE (COUNT) if there is no clock for the invoking image. The processor-dependent value is incremented by one for each clock count until the value COUNT_MAX is reached and is reset to zero at the next count. It lies in the range 0 to COUNT_MAX if there is a clock.

COUNT_RATE (optional) shall be an integer or real scalar. It is an INTENT (OUT) argument. It is assigned a processor-dependent approximation to the number of processor clock counts per second, or zero if there is no clock for the invoking image.

Anybody know what would be considered best practice for using system_clock?

There was a long thread about this issue at comp.lang.fortran: https://groups.google.com/g/comp.lang.fortran/c/Iif-iCkig_Y/m/Hn8XykAYCwAJ

Here’s an example that can get you into trouble:

use, intrinsic :: iso_fortran_env
integer(int64) :: rate
integer(int32) :: start_time, end_time
call system_clock (count_rate=rate)
...
call system_clock (count=start_time)
...
call system_clock (count=end_time)
print *, real(end_time-start_time)/real(rate), " seconds"
end

Best practice is to use the same type and kind for all the arguments across the various calls to SYSTEM_CLOCK. The problem is that a processor may select different count rates depending on the kind of the COUNT argument, or possibly the COUNT_MAX argument if present) and the relationship may not always be clearly documented nor consistent. J3 had extensive discussion of this in the J3 email list, and there is also paper https://j3-fortran.org/doc/year/21/21-117r3.txt which supplies additional text for F202X guiding the user. To wit:

6 If more than one clock is available, the types and kinds of the arguments to SYSTEM_CLOCK determine which clock is accessed. The processor should document the relationship between the clock selection and the argument characteristics.

7 Different invocations of SYSTEM_CLOCK should use the same types and kinds for the arguments, to ensure that any timing calculations are based on the same clock.

8 It it recommended that all references to SYSTEM_CLOCK use integer arguments with a decimal exponent range of at least 18. This lets the processor select the most accurate clock available while minimizing how often the COUNT value resets to zero.

3 Likes

Yeah, I thought I remembered some discussion about it, and went and started rereading some of it. Unfortunately that discussion left me more confused about what the recommended best practice was. Thanks for the reference to paper.

Just to see if I’ve got it straight, the recommendation is to never use a real argument for count_rate, even though the standard says it should be allowed, and that the count argument should be what selects the clock if it is present?

Here is a program to demonstrate what @sblionel wrote:

program test_system_clock
use iso_fortran_env, only: int32, int64
implicit none
integer(kind=int32)    :: count_rate_32,t32(2)
integer(kind=int64)    :: count_rate_64,t64(2), i, n = 10_int64**9
real(kind=kind(1.0d0)) :: x,cpu_t(2)
call system_clock(count_rate=count_rate_32)
call system_clock(count_rate=count_rate_64)
call cpu_time(cpu_t(1))
call system_clock (count=t32(1))
call system_clock (count=t64(1))
do i=1,n
   call random_number(x)
end do
! dummy statement to ensure loop is not optimized away
if (x > 1.0) print*,x 
call system_clock (count=t32(2))
call system_clock (count=t64(2))
call cpu_time(cpu_t(2))
print "(*(a20))","","system_clock_32","system_clock_64","cpu_time"
print "(a20,2i20)","count_rate",count_rate_32,count_rate_64
print "(a20,2i20,f20.6)","t1",t32(1),t64(1),cpu_t(1)
print "(a20,2i20,f20.6)","t2",t32(2),t64(2),cpu_t(2)
print "(a20,3f20.6)","elapsed_time",(t32(2)-t32(1))/dble(count_rate_32),(t64(2)-t64(1))/dble(count_rate_64),cpu_t(2)-cpu_t(1)
end program test_system_clock

gfortran output on Windows:

                         system_clock_32     system_clock_64            cpu_time
          count_rate                1000            10000000
                  t1           155999000       1559990867256            0.015625
                  t2           156012796       1560128872540           13.796875
        elapsed_time           13.796000           13.800528           13.781250

Intel Fortran output on Windows:

                         system_clock_32     system_clock_64            cpu_time
          count_rate               10000             1000000
                  t1          1605185760    1650069402576000            0.000000
                  t2          1605241640    1650069408164000            5.562500
        elapsed_time            5.588000            5.588000            5.562500

So for gfortran you can measure elapsed time with higher precision with int64 arguments, but for Intel Fortran it does not make a difference?

My usage pattern was like the following, which I expected to work based on the discussions and interp papers, to pick the clock via the count argument, but give a more accurate value for count_rate, given the possibility of providing a floating point value. Instead, nagfor gave me that message, and appeared to produce nonsensical results.

integer(int64) :: start, finish
real(real64) :: rate
call system_clock(start, rate)
! do something
call system_clock(finish)
print *, "Took ", (finish-start)/rate, " seconds"

I wonder what the compiler support for that processor will say about this!

@everythingfunctional , perhaps you will take a look at this thread and this one for the “my_cpu_time” snippet with SYSTEM_CLOCK:

   subroutine my_cpu_time( time )
      use, intrinsic :: iso_fortran_env, only : I8 => int64
      ! Argument list
      real(kind=xx), intent(inout) :: time  !<-- where xx is the REAL kind of interest
      ! Local variables
      integer(I8) :: tick
      integer(I8) :: rate
      call system_clock (tick, rate)
      time = real(tick, kind=kind(time) ) / real(rate, kind=kind(time) )
      return
   end subroutine my_cpu_time

It will be interesting to see how NAG Fortran results turn out for the first case involving the dynamic growth of an array shape i.e., whether SYSTEM_CLOCK works ok with a 64-bit integer kind as rate.

That is, as opposed to your usage of a 128-bit floating point kind which is typically unseen in such C parlance around clock and its clock_t type which the NAG transpiler may use behind the scenes and thus it may have inadequate support built-in and validated for the case you presented it.

This is very important to remember.
With gfortran, you should use consistent type and kind arguments with SYSTEM_CLOCK.
I always use 64-bit integers with Windows x64 so that, not only the rate is better, but also the precision is usually based on QueryPerformanceCounter.
I have not used ifort for a while, but windows SYSTEM_CLOCK was based on a timer with 1/64 second precision (GetTickCount WINAPI ?) which is next to useless as a timer.

“Rate value varies widely”
The value of “rate” is not a true indicator of timer precision. You will also find it varies with different processors. The following is an example from my i5-2300 for 32-bit and 64-bit integers.

options=-g -fimplicit-none -O3 -march=native -ffast-math
 system_clock ( integer*4:count, rate )
 count =    18324688
 rate  =        1000
  
 system_clock ( integer*8:count, rate )
 count =          49994596815
 rate  =              2728212

Of interest is the 64-bit Integer test. My example gives a rate that is 1/1024 x processor clock, but Belavisky’s example gives a rate of 10,000,000, which is probably based on QueryPerformanceCounter, but I don’t know the processor or OS.
With Windows, the best clock rate would be the rdtsc cpu clock rate, but the processors I have provide a mix of different approaches to rate.

The results I showed were for Windows, and
echo %PROCESSOR_ARCHITECTURE% / %PROCESSOR_IDENTIFIER% / %PROCESSOR_LEVEL% / %PROCESSOR_REVISION%
gives
AMD64 / Intel64 Family 6 Model 158 Stepping 11, GenuineIntel / 6 / 9e0b

I see that on WSL2 ifort does measure time with higher precision using int64 arguments:

                         system_clock_32     system_clock_64            cpu_time
          count_rate               10000             1000000
                  t1          2004896385    1650109373638533            0.007333
                  t2          2004954637    1650109379463798            5.832545
        elapsed_time            5.825200            5.825265            5.825212

I have frequently complained about the poor precision of many timers available, such as SYSTEM_CLOCK, CPU_TIME and omp_get_wtime, even when their claimed rate, or “ticks” imply much greater accuracy.
To demonstrate, I have a (generic) timer tester which cycles through calls to the timing routine and records how ofter the reported time varies. I have used this for:
SYSTEM_CLOCK (I4 and I8),
CPU_TIME,
GetTickCount,
QueryPerformanceCounter,
rdtsc (with difficulty) and
omp_get_wtime

omp_get_wtime is interesting, as for the windows gfortran implementation I have, it achieves a precision of 0.016 seconds, while claiming a precisoin of omp_get_wtick 3.665E-07 second, similar to SYSTEM_CLOCK (I8).
This poor achieved precision of 0.016 seconds is similar to CPU_TIME, GetTickCount and SYSTEM_CLOCK (I
4) using gfortran.

The test program I use is a template for other timer tests:

   Program test_wtime
!
!  This approach records 4 updates to the elapsed time returned from "omp_get_wtime"

      real*8     timer_precision, last_seconds, next_seconds, ticks(0:4)
      integer*4  k,n, nn(4)
      real*8     dt
!
      real*8,    external :: omp_get_wtick
      real*8,    external :: omp_get_wtime
!
!  test precision of omp_get_wtime, by recording 4 time changes
      timer_precision = omp_get_wtick ()  ! timer_precision as reported
      next_seconds    = omp_get_wtime ()

      ticks(0)        = next_seconds
      do k = 1,4
        n = 0
        last_seconds = next_seconds
        do
          n = n+1
          next_seconds  = omp_get_wtime ()                         ! call to timer
          if ( next_seconds /= last_seconds ) exit
        end do
        nn(k)    = n
        ticks(k) = next_seconds
      end do

!   report any variability of time updates
      do k = 1,4
        dt = (ticks(k)-ticks(k-1))
        write (*,*) k, nn(k), ticks(k)-ticks(k-1), dt
      end do
      dt = next_seconds-last_seconds                             ! minimum time between ticks

!   report the timer precision "dt" in comparison to the claimed precisoin "omp_get_wtick"
      write ( *,11) 'omp_get_wtime rate      = ', timer_precision,' second ???'
      write ( *,11) 'omp_get_wtime acc       = ', dt,' seconds : ', n,' cycles  ####'
   11 format ( 1x,a,es12.3,a,i0,a)
!
   end Program test_wtime

The .bat file I use is:

set options=-O2 -march=native -fopenmp

gfortran %1.f90 %options% -o %1.exe
 
dir %1.* /od
echo ================================================================ >> %1.log
now           >> %1.log
set g >> %1.log

set options >> %1.log
get_processor >> %1.log

audit -start >> %1.log
%1  %2 >> %1.log
audit -end >> %1.log

notepad %1.log

The outcome of the test is:

================================================================ 
  It is now Monday, 18 April 2022 at 12:51:54.335
gcc.ver=11.1.0
gcc_dir=C:\Program Files (x86)\gcc_eq\gcc_11.1.0
gcc_path=C:\Program Files (x86)\gcc_eq\gcc_11.1.0\bin;C:\Program Files (x86)\gcc_eq\gcc_11.1.0\libexec\gcc\x86_64-w64-mingw32\11.1.0
options=-O2 -march=native -fopenmp
 Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz
[AUDIT Ver 1.21] Monday, 18 April 2022 at 12:51:54.444
           1       52755   1.5999972820281982E-002   1.5999972820281982E-002
           2      115126   1.4999985694885254E-002   1.4999985694885254E-002
           3      144383   1.6000032424926758E-002   1.6000032424926758E-002
           4      145260   1.5999972820281982E-002   1.5999972820281982E-002
 omp_get_wtime rate      =    3.665E-07 second ???
 omp_get_wtime acc       =    1.600E-02 seconds : 145260 cycles  ####
[AUDIT Ver 1.21] elapse        0.154 seconds: Monday, 18 April 2022 at 12:51:54.600        1.015

(get_processor and audit are utilities for reporting)

I would be interested if this could be used by others and show what OS and compilers achieve.

A tick based test with better results is:

   Program test_system_clock
!
!  This approach records 4 updates to the elapsed time returned from "SYSTEM_CLOCK"

      integer*8  clock_precision, last_tick, next_tick, ticks(0:4)
      integer*4  k,n, nn(4)
      real*8     dt
!
!  test precision of SYSTEM_CLOCK, by recording 4 time changes
      call SYSTEM_CLOCK (next_tick, clock_precision)

      ticks(0)   = next_tick
      do k = 1,4
        n = 0
        last_tick = next_tick
        do
          n = n+1
          call SYSTEM_CLOCK (next_tick)
          if ( next_tick /= last_tick ) exit
        end do
        nn(k)    = n
        ticks(k) = next_tick
      end do

!   report any variability of time updates
      do k = 1,4
        dt = dble(ticks(k)-ticks(k-1)) / dble(clock_precision)
        write (*,*) k, nn(k), ticks(k)-ticks(k-1), dt
      end do

      dt = dble(next_tick-last_tick)/dble(clock_precision)      ! minimum time between ticks

!   report the timer precision "dt" in comparison to the claimed precision "SYSTEM_CLOCK (rate)"
      write ( *,12) 'system_clock_rate       = ', clock_precision,' ticks per second'
      write ( *,11) 'system_clock_tick acc   = ', dt,' seconds : ', n,' cycles  ####'
   11 format ( 1x,a,es12.3,a,i0,a)
   12 format ( 1x,a,i12,a,i0,a)
!
   end Program test_system_clock