Proper Usage of `system_clock`

I thought I had figured out the correct usage for system_clock, but now nagfor is giving me the following message.

Questionable: example/expander/main.f90, line 23: Argument COUNT (no. 1) to intrinsic subroutine SYSTEM_CLOCK is of type INTEGER(int64), but argument COUNT_RATE (no. 2) is of type DOUBLE PRECISION

The standard has the following descriptions for the arguments.

COUNT (optional) shall be an integer scalar. It is an INTENT (OUT) argument. It is assigned a processor-dependent value based on the value of a processor clock, or −HUGE (COUNT) if there is no clock for the invoking image. The processor-dependent value is incremented by one for each clock count until the value COUNT_MAX is reached and is reset to zero at the next count. It lies in the range 0 to COUNT_MAX if there is a clock.

COUNT_RATE (optional) shall be an integer or real scalar. It is an INTENT (OUT) argument. It is assigned a processor-dependent approximation to the number of processor clock counts per second, or zero if there is no clock for the invoking image.

Anybody know what would be considered best practice for using system_clock?

There was a long thread about this issue at comp.lang.fortran: https://groups.google.com/g/comp.lang.fortran/c/Iif-iCkig_Y/m/Hn8XykAYCwAJ

Here’s an example that can get you into trouble:

use, intrinsic :: iso_fortran_env
integer(int64) :: rate
integer(int32) :: start_time, end_time
call system_clock (count_rate=rate)
...
call system_clock (count=start_time)
...
call system_clock (count=end_time)
print *, real(end_time-start_time)/real(rate), " seconds"
end

Best practice is to use the same type and kind for all the arguments across the various calls to SYSTEM_CLOCK. The problem is that a processor may select different count rates depending on the kind of the COUNT argument, or possibly the COUNT_MAX argument if present) and the relationship may not always be clearly documented nor consistent. J3 had extensive discussion of this in the J3 email list, and there is also paper https://j3-fortran.org/doc/year/21/21-117r3.txt which supplies additional text for F202X guiding the user. To wit:

6 If more than one clock is available, the types and kinds of the arguments to SYSTEM_CLOCK determine which clock is accessed. The processor should document the relationship between the clock selection and the argument characteristics.

7 Different invocations of SYSTEM_CLOCK should use the same types and kinds for the arguments, to ensure that any timing calculations are based on the same clock.

8 It it recommended that all references to SYSTEM_CLOCK use integer arguments with a decimal exponent range of at least 18. This lets the processor select the most accurate clock available while minimizing how often the COUNT value resets to zero.

5 Likes

Yeah, I thought I remembered some discussion about it, and went and started rereading some of it. Unfortunately that discussion left me more confused about what the recommended best practice was. Thanks for the reference to paper.

Just to see if I’ve got it straight, the recommendation is to never use a real argument for count_rate, even though the standard says it should be allowed, and that the count argument should be what selects the clock if it is present?

Here is a program to demonstrate what @sblionel wrote:

program test_system_clock
use iso_fortran_env, only: int32, int64
implicit none
integer(kind=int32)    :: count_rate_32,t32(2)
integer(kind=int64)    :: count_rate_64,t64(2), i, n = 10_int64**9
real(kind=kind(1.0d0)) :: x,cpu_t(2)
call system_clock(count_rate=count_rate_32)
call system_clock(count_rate=count_rate_64)
call cpu_time(cpu_t(1))
call system_clock (count=t32(1))
call system_clock (count=t64(1))
do i=1,n
   call random_number(x)
end do
! dummy statement to ensure loop is not optimized away
if (x > 1.0) print*,x 
call system_clock (count=t32(2))
call system_clock (count=t64(2))
call cpu_time(cpu_t(2))
print "(*(a20))","","system_clock_32","system_clock_64","cpu_time"
print "(a20,2i20)","count_rate",count_rate_32,count_rate_64
print "(a20,2i20,f20.6)","t1",t32(1),t64(1),cpu_t(1)
print "(a20,2i20,f20.6)","t2",t32(2),t64(2),cpu_t(2)
print "(a20,3f20.6)","elapsed_time",(t32(2)-t32(1))/dble(count_rate_32),(t64(2)-t64(1))/dble(count_rate_64),cpu_t(2)-cpu_t(1)
end program test_system_clock

gfortran output on Windows:

                         system_clock_32     system_clock_64            cpu_time
          count_rate                1000            10000000
                  t1           155999000       1559990867256            0.015625
                  t2           156012796       1560128872540           13.796875
        elapsed_time           13.796000           13.800528           13.781250

Intel Fortran output on Windows:

                         system_clock_32     system_clock_64            cpu_time
          count_rate               10000             1000000
                  t1          1605185760    1650069402576000            0.000000
                  t2          1605241640    1650069408164000            5.562500
        elapsed_time            5.588000            5.588000            5.562500

So for gfortran you can measure elapsed time with higher precision with int64 arguments, but for Intel Fortran it does not make a difference?

1 Like

My usage pattern was like the following, which I expected to work based on the discussions and interp papers, to pick the clock via the count argument, but give a more accurate value for count_rate, given the possibility of providing a floating point value. Instead, nagfor gave me that message, and appeared to produce nonsensical results.

integer(int64) :: start, finish
real(real64) :: rate
call system_clock(start, rate)
! do something
call system_clock(finish)
print *, "Took ", (finish-start)/rate, " seconds"

I wonder what the compiler support for that processor will say about this!

@everythingfunctional , perhaps you will take a look at this thread and this one for the “my_cpu_time” snippet with SYSTEM_CLOCK:

   subroutine my_cpu_time( time )
      use, intrinsic :: iso_fortran_env, only : I8 => int64
      ! Argument list
      real(kind=xx), intent(inout) :: time  !<-- where xx is the REAL kind of interest
      ! Local variables
      integer(I8) :: tick
      integer(I8) :: rate
      call system_clock (tick, rate)
      time = real(tick, kind=kind(time) ) / real(rate, kind=kind(time) )
      return
   end subroutine my_cpu_time

It will be interesting to see how NAG Fortran results turn out for the first case involving the dynamic growth of an array shape i.e., whether SYSTEM_CLOCK works ok with a 64-bit integer kind as rate.

That is, as opposed to your usage of a 128-bit floating point kind which is typically unseen in such C parlance around clock and its clock_t type which the NAG transpiler may use behind the scenes and thus it may have inadequate support built-in and validated for the case you presented it.

This is very important to remember.
With gfortran, you should use consistent type and kind arguments with SYSTEM_CLOCK.
I always use 64-bit integers with Windows x64 so that, not only the rate is better, but also the precision is usually based on QueryPerformanceCounter.
I have not used ifort for a while, but windows SYSTEM_CLOCK was based on a timer with 1/64 second precision (GetTickCount WINAPI ?) which is next to useless as a timer.

“Rate value varies widely”
The value of “rate” is not a true indicator of timer precision. You will also find it varies with different processors. The following is an example from my i5-2300 for 32-bit and 64-bit integers.

options=-g -fimplicit-none -O3 -march=native -ffast-math
 system_clock ( integer*4:count, rate )
 count =    18324688
 rate  =        1000
  
 system_clock ( integer*8:count, rate )
 count =          49994596815
 rate  =              2728212

Of interest is the 64-bit Integer test. My example gives a rate that is 1/1024 x processor clock, but Belavisky’s example gives a rate of 10,000,000, which is probably based on QueryPerformanceCounter, but I don’t know the processor or OS.
With Windows, the best clock rate would be the rdtsc cpu clock rate, but the processors I have provide a mix of different approaches to rate.

The results I showed were for Windows, and
echo %PROCESSOR_ARCHITECTURE% / %PROCESSOR_IDENTIFIER% / %PROCESSOR_LEVEL% / %PROCESSOR_REVISION%
gives
AMD64 / Intel64 Family 6 Model 158 Stepping 11, GenuineIntel / 6 / 9e0b

I see that on WSL2 ifort does measure time with higher precision using int64 arguments:

                         system_clock_32     system_clock_64            cpu_time
          count_rate               10000             1000000
                  t1          2004896385    1650109373638533            0.007333
                  t2          2004954637    1650109379463798            5.832545
        elapsed_time            5.825200            5.825265            5.825212

I have frequently complained about the poor precision of many timers available, such as SYSTEM_CLOCK, CPU_TIME and omp_get_wtime, even when their claimed rate, or “ticks” imply much greater accuracy.
To demonstrate, I have a (generic) timer tester which cycles through calls to the timing routine and records how ofter the reported time varies. I have used this for:
SYSTEM_CLOCK (I4 and I8),
CPU_TIME,
GetTickCount,
QueryPerformanceCounter,
rdtsc (with difficulty) and
omp_get_wtime

omp_get_wtime is interesting, as for the windows gfortran implementation I have, it achieves a precision of 0.016 seconds, while claiming a precisoin of omp_get_wtick 3.665E-07 second, similar to SYSTEM_CLOCK (I8).
This poor achieved precision of 0.016 seconds is similar to CPU_TIME, GetTickCount and SYSTEM_CLOCK (I
4) using gfortran.

The test program I use is a template for other timer tests:

   Program test_wtime
!
!  This approach records 4 updates to the elapsed time returned from "omp_get_wtime"

      real*8     timer_precision, last_seconds, next_seconds, ticks(0:4)
      integer*4  k,n, nn(4)
      real*8     dt
!
      real*8,    external :: omp_get_wtick
      real*8,    external :: omp_get_wtime
!
!  test precision of omp_get_wtime, by recording 4 time changes
      timer_precision = omp_get_wtick ()  ! timer_precision as reported
      next_seconds    = omp_get_wtime ()

      ticks(0)        = next_seconds
      do k = 1,4
        n = 0
        last_seconds = next_seconds
        do
          n = n+1
          next_seconds  = omp_get_wtime ()                         ! call to timer
          if ( next_seconds /= last_seconds ) exit
        end do
        nn(k)    = n
        ticks(k) = next_seconds
      end do

!   report any variability of time updates
      do k = 1,4
        dt = (ticks(k)-ticks(k-1))
        write (*,*) k, nn(k), ticks(k)-ticks(k-1), dt
      end do
      dt = next_seconds-last_seconds                             ! minimum time between ticks

!   report the timer precision "dt" in comparison to the claimed precisoin "omp_get_wtick"
      write ( *,11) 'omp_get_wtime rate      = ', timer_precision,' second ???'
      write ( *,11) 'omp_get_wtime acc       = ', dt,' seconds : ', n,' cycles  ####'
   11 format ( 1x,a,es12.3,a,i0,a)
!
   end Program test_wtime

The .bat file I use is:

set options=-O2 -march=native -fopenmp

gfortran %1.f90 %options% -o %1.exe
 
dir %1.* /od
echo ================================================================ >> %1.log
now           >> %1.log
set g >> %1.log

set options >> %1.log
get_processor >> %1.log

audit -start >> %1.log
%1  %2 >> %1.log
audit -end >> %1.log

notepad %1.log

The outcome of the test is:

================================================================ 
  It is now Monday, 18 April 2022 at 12:51:54.335
gcc.ver=11.1.0
gcc_dir=C:\Program Files (x86)\gcc_eq\gcc_11.1.0
gcc_path=C:\Program Files (x86)\gcc_eq\gcc_11.1.0\bin;C:\Program Files (x86)\gcc_eq\gcc_11.1.0\libexec\gcc\x86_64-w64-mingw32\11.1.0
options=-O2 -march=native -fopenmp
 Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz
[AUDIT Ver 1.21] Monday, 18 April 2022 at 12:51:54.444
           1       52755   1.5999972820281982E-002   1.5999972820281982E-002
           2      115126   1.4999985694885254E-002   1.4999985694885254E-002
           3      144383   1.6000032424926758E-002   1.6000032424926758E-002
           4      145260   1.5999972820281982E-002   1.5999972820281982E-002
 omp_get_wtime rate      =    3.665E-07 second ???
 omp_get_wtime acc       =    1.600E-02 seconds : 145260 cycles  ####
[AUDIT Ver 1.21] elapse        0.154 seconds: Monday, 18 April 2022 at 12:51:54.600        1.015

(get_processor and audit are utilities for reporting)

I would be interested if this could be used by others and show what OS and compilers achieve.

A tick based test with better results is:

   Program test_system_clock
!
!  This approach records 4 updates to the elapsed time returned from "SYSTEM_CLOCK"

      integer*8  clock_precision, last_tick, next_tick, ticks(0:4)
      integer*4  k,n, nn(4)
      real*8     dt
!
!  test precision of SYSTEM_CLOCK, by recording 4 time changes
      call SYSTEM_CLOCK (next_tick, clock_precision)

      ticks(0)   = next_tick
      do k = 1,4
        n = 0
        last_tick = next_tick
        do
          n = n+1
          call SYSTEM_CLOCK (next_tick)
          if ( next_tick /= last_tick ) exit
        end do
        nn(k)    = n
        ticks(k) = next_tick
      end do

!   report any variability of time updates
      do k = 1,4
        dt = dble(ticks(k)-ticks(k-1)) / dble(clock_precision)
        write (*,*) k, nn(k), ticks(k)-ticks(k-1), dt
      end do

      dt = dble(next_tick-last_tick)/dble(clock_precision)      ! minimum time between ticks

!   report the timer precision "dt" in comparison to the claimed precision "SYSTEM_CLOCK (rate)"
      write ( *,12) 'system_clock_rate       = ', clock_precision,' ticks per second'
      write ( *,11) 'system_clock_tick acc   = ', dt,' seconds : ', n,' cycles  ####'
   11 format ( 1x,a,es12.3,a,i0,a)
   12 format ( 1x,a,i12,a,i0,a)
!
   end Program test_system_clock
2 Likes

I just stumbled across an unexpected result using a real rate variable in the system_clock() intrinsic, so I thought I would post it here in this old thread. Here is a code that demonstrates the problem:

program syst
   use, intrinsic :: iso_fortran_env, only : int32, int64
   implicit none
   integer(int64) :: knt0, knt1
   real :: rate
   integer(int32) :: i320, i321
   call system_clock( knt0, knt1 )
   rate = real( knt1 )
   write(*,*) 'from real(int64), rate=', rate
   call system_clock( knt0, rate )
   write(*,*) 'from int64, rate=', rate
   call system_clock( i320, i321 )
   rate = real( i321 )
   write(*,*) 'from real(int32), rate=', rate
   call system_clock( i320, rate )
   write(*,*) 'from int32, rate=', rate
end program syst

$ gfortran syst.f90 && a.out
 from real(int64), rate=   1.00000000E+09
 from int64, rate=   1000.00000    
 from real(int32), rate=   1000.00000    
 from int32, rate=   1000.00000

This is on MacOS. That second result is unexpected, or possibly even incorrect. The rate that is returned is expected to match the return value of the int64 first argument, but it doesn’t. It instead matches the return value of an int32 argument, which it does in the fourth call.

The practical workaround for this, I guess, is to only use integer arguments of the same kind, do your own real conversion, and never use a real rate argument. This is consistent with the @sblionel post above.

edit: I should have mentioned that if you declare rate to be real64, then the code works as expected. You get 1.e9 in the first two calls, and you get 1.e3 in the second two calls.

1 Like

Here is a longer program that shows that this problem is even worse than I thought before. This program uses several timers with various combinations of integer and real arguments of various kinds.

module timer_mod
   use, intrinsic :: iso_fortran_env, only : int32, int64, real64
contains
   subroutine time0( t )
      implicit none
      logical, save :: first = .true.
      integer(int64), save :: knt0
      integer(int64) :: knt1
      real, save    :: rate
      real, intent(out) :: t
      if ( first ) then ! initialize.
         first = .false.
         call system_clock( knt0, knt1 )
         rate = real(knt1)
      endif
      call system_clock( knt1 )
      t = real( knt1 - knt0 ) / rate
      return
   end subroutine time0

   subroutine time1( t )
      implicit none
      logical, save :: first = .true.
      integer(int64), save :: knt0
      integer(int64) :: knt1
      real, save    :: rate
      real, intent(out) :: t
      if ( first ) then ! initialize.
         first = .false.
         call system_clock( knt0, rate )
      endif
      call system_clock( knt1 )
      t = real( knt1 - knt0 ) / rate
      return
   end subroutine time1

   subroutine time2( t )
      implicit none
      logical, save :: first = .true.
      integer(int32), save :: knt0
      integer(int32) :: knt1
      real, save    :: rate
      real, intent(out) :: t
      if ( first ) then ! initialize.
         first = .false.
         call system_clock( knt0, knt1 )
         rate = real(knt1)
      endif
      call system_clock( knt1 )
      t = real( knt1 - knt0 ) / rate
      return
   end subroutine time2

   subroutine time3( t )
      implicit none
      logical, save :: first = .true.
      integer(int32), save :: knt0
      integer(int32) :: knt1
      real, save    :: rate
      real, intent(out) :: t
      if ( first ) then ! initialize.
         first = .false.
         call system_clock( knt0, rate )
      endif
      call system_clock( knt1 )
      t = real( knt1 - knt0 ) / rate
      return
   end subroutine time3

   subroutine time4( t )
      implicit none
      logical, save :: first = .true.
      integer(int32), save :: knt0
      integer(int32) :: knt1
      real(real64), save    :: rate
      real, intent(out) :: t
      if ( first ) then ! initialize.
         first = .false.
         call system_clock( knt0, rate )
      endif
      call system_clock( knt1 )
      t = real( knt1 - knt0 ) / rate
      return
   end subroutine time4

end module timer_mod

program syst
   use timer_mod
   implicit none
   integer :: i
   real :: a0, a1, b0, b1, c0, c1, d0, d1, e0, e1
   character(*), parameter :: cfmt='(*(g0.3:1x))'

   call time0( a0 )
   call time1( b0 )
   call time2( c0 )
   call time3( d0 )
   call time4( e0 )
   do i = 1, 4
      call sleep(2)  ! two seconds, wall time.
      call time0( a1 )
      call time1( b1 )
      call time2( c1 )
      call time3( d1 )
      call time4( e1 )
      print cfmt, (a1-a0), (b1-b0), (c1-c0), (d1-d0), (e1-e0)
   enddo

end program syst

$ gfortran  syst.f90 && a.out
2.01 0.00 2.01 2.01 2.01
4.01 0.00 4.01 4.01 4.01
6.02 0.00 6.01 6.01 6.01
8.02 0.00 8.02 8.02 8.02

$ flang syst.f90 && a.out
.201E+4 .201E+4 2.01 .200E-2 .200E-2
.401E+4 .401E+4 4.01 .401E-2 .401E-2
.602E+4 .602E+4 6.01 .601E-2 .601E-2
.802E+4 .802E+4 8.02 .802E-2 .802E-2

This code requires a sleep() subroutine, which is not a standard function, but it is a common library routine.

The gfortran output is all consistent except for the time1() results, which uses the combination of arguments that I pointed out were incorrect in the earlier post.

However, the flang results are even worse. The only correct ones are the time2() results with two int32 arguments. All of the others are incorrect, either too large by a factor of 1000 or too small by a factor of 1000.

BTW, the first column, the time0 results with two int64 arguments, are the ones that I think most people would expect to be the most accurate and most reliable.

edit: I found that this has been recently reported for flang. [flang] system_clock inflates time differences on macOS · Issue #176505 · llvm/llvm-project · GitHub

edit2: I have compiled this program with the NAG fortran compiler, and its output is similar to the gfortran output above. Namely, the combination of an int64 count and a real32 rate as arguments does not give correct results, while the other combinations work as expected. The NAG fortran compiler requires the addition of:

use f90_unix_proc, only: sleep

in order to access the sleep() subroutine. This compiler also prints warnings about the mixed-type arguments for the system_clock() subroutine, even for the cases that work correctly. These warnings are consistent with the f2023 text which recommends using only integer arguments of the same kind.

2 Likes