Fortran 2023 standard and system clock

There are some changes to the system clock routine. I still did not get the idea of its implementation in the new standard.

It is said in John Reid’s presentation (The new features of Fortran 2023 ) that Fortran 2018 gives the freedom to use different kinds of integers for arguments. I did check it with Intel’s latest compiler and I see it does throw an error. But it is avoided by using the same kind of integer!

So what will be the change in the system_clock routine? Will we see any error in using different kinds or it is just a recommendation to use the same kind of integers?

F2023 explicitly specifies that all integer arguments shall have the same kind. F2018 did not mention that so, in principle, one could try to use different kinds, processor-allowing.

16.9.202 SYSTEM_CLOCK ([COUNT, COUNT_RATE, COUNT_MAX])
...
In a reference to SYSTEM_CLOCK, all integer arguments shall have the same kind type parameter.
1 Like

I would expect that this change clarifes that different integer kinds can be used as arguments, but when using multiple optional arguments, all arguments should be of the same integer kind.

Gfortran provides different precision (value of count_rate) depending on the kind of count_rate provided. This is a good thing.

If Intel’s latest compiler does throw an error when mixing integer kinds, that could be a swift implementation of the 2023 standard !
Does Intel’s latest compiler provide different values for count_rate depending on the integer kind used ?
Unfortunately, in the past, the value of “count_rate” was no guarantee of the accuracy of the “count” reported.

1 Like

Yes, Intel’s implementation provides different rates for the different integer kinds. The changes in F23 were to disallow mixed kinds, which could lead to unpredictable results, and to alert the reader about the possibility of “different clocks”… There were no syntax changes. See https://j3-fortran.org/doc/year/21/21-117r3.txt for more.

4 Likes

My tests with the latest intel compiler

1. Same integer kind

program system_clock_test
  implicit none

  integer ( 4 ), parameter :: x = 100 , y = 100
  integer ( 4 ) :: steps = 100000, step
  integer ( 8 ) :: start_count, rate
  integer ( 8 ) :: stop_count
  real ( 8 ), dimension ( :, : ), allocatable :: r, c


  allocate ( r(x,y) , c(x,y) )

  !====
  call system_clock(count=start_count , count_rate=rate)
  !====

  do step = 1, steps
     call random_number ( r )
     r = 0.4**4 + 0.02*( 0.5 - r*0.258 )**2
  end do


  !=====
  call system_clock(count=stop_count)
  !=====

  print*, ' system clock time        : ', real(max(stop_count - start_count , 1_8 )) /real(rate),' seconds'
  print*, ' kind type start_count    : ', kind(start_count)
  print*, ' kind type stop_count     : ', kind(stop_count)


end program system_clock_test
C:\Users\owner\Desktop>>ifx test.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.0.0 Build 20231017
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.
-out:test.exe
-subsystem:console
test.obj

C:\Users\owner\Desktop>> test
  system clock time        :    9.057000      seconds
  kind type start_count    :            8
  kind type stop_count     :            8

2. Different integer kind

program system_clock_test
  implicit none

  integer ( 4 ), parameter :: x = 100 , y = 100
  integer ( 4 ) :: steps = 100000, step
  integer ( 8 ) :: start_count, rate
  integer ( 4 ) :: stop_count
  real ( 8 ), dimension ( :, : ), allocatable :: r, c


  allocate ( r(x,y) , c(x,y) )

  !====
  call system_clock(count=start_count , count_rate=rate)
  !====

  do step = 1, steps
     call random_number ( r )
     r = 0.4**4 + 0.02*( 0.5 - r*0.258 )**2
  end do


  !=====
  call system_clock(count=stop_count)
  !=====

  print*, ' system clock time        : ', real(max(stop_count - start_count , 1_8 )) /real(rate),' seconds'
  print*, ' kind type start_count    : ', kind(start_count)
  print*, ' kind type stop_count     : ', kind(stop_count)


end program system_clock_test
C:\Users\owner\Desktop>ifx test.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.0.0 Build 20231017
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test.exe
-subsystem:console
test.obj

C:\Users\owner\Desktop>test
  system clock time        :   1.0000000E-06  seconds
  kind type start_count    :            8
  kind type stop_count     :            4

The following code can identify the difference between the precision suggested by “clock_rate” and the actual precision available. This varies greatly with Kind and operating system. Windows and Kind=4 is the worst !

!   test the clock rate and precision for various compilets
!
     integer, parameter :: ip = selected_int_kind ( 10 )   ! try 6 10 or possibly 18
     integer(ip) :: clock_tick, last_tick, clock_rate

     integer*8   :: elapse_ticks(0:10), num_ticks(0:10), calls, ticks
     integer     :: k, i
     real*8      :: sec

     call system_clock ( last_tick, clock_rate )
     elapse_ticks(0) = last_tick

     do k = 0,10
       do i = 1,9999999
         call system_clock ( clock_tick )
         if ( clock_tick == last_tick ) cycle
         elapse_ticks(k) = clock_tick
         num_ticks(k)    = i
         last_tick       = clock_tick
         exit
       end do
     end do

     sec = dble ( elapse_ticks(10)-elapse_ticks(0) ) / dble(clock_rate) / sum(num_ticks)
     write (*,*) 'Integer KIND =',ip
     write (*,*) 'clock_rate =',clock_rate,'  ticks/sec reporting precision'
     write (*,*) 'system_clock call time =',sec,' seconds / call'

     write (*,fmt='(/a)') ' Step     Calls    Change     Seconds'
     do k = 1,10
       calls = num_ticks(k)
       ticks = elapse_ticks(k)-elapse_ticks(k-1)
       sec   = dble ( ticks ) / dble(clock_rate)
       write (*,fmt='(i5,i10,i10, es12.3 )') k, calls, ticks, sec
     end do

     write (*,fmt='(/a)') 'NOTE :'
     write (*,*) ' Calls is the number of repeated calls before "clock_tick" changes value'
     write (*,*) ' Change is the minimum change to clock_tick that can be reported'
     write (*,*) ' Seconds is the true accuracy available'
         
   end

Typically when 4-byte integers are used, the actual precision can be as little as 64 changes per second, while when 8-byte integers are used, a much better precision is available.

The Windows version of Gfortran I have shows this effect, so kind=8 should always be used.

I have not tested Ifort lately, but previously it provided a clock_tick value that changed only 64 times per second, much less than is required.

When an 8-byte integer is used, it should be possible to get values of clock_tick, based on the processor clock, which can be 5 GHz. I am sure if Fortran compiler developers used Fortran timers, this would have been available many years ago.

Typically CPU_TIME also suffers from this poor precision, due to the frequency at which the CPU time accumulator is updated. There is no “KIND” fix I am aware of, as usnig REAL(4) or REAL(8) has the same result.

The Intel documentation says:

If the type is INTEGER(1), count_rate is 0. If the type is INTEGER(2), count_rate is 1000. If the type is INTEGER(4) or REAL(4), count_rate is 10000. If the type is INTEGER(8), REAL(8), or REAL(16), count_rate is 1000000.

It is dependent on the OS clock for actual resolution. On Windows, you can do better by using “multimedia timers”.

@sblionel ,

On Windows, why can’t we get easy access to RDTSC clock ?

It is the best timer for high precision timing and all Windows Fortran compilers should give programmers access to the best timer, even if as a Fortran extension.
There are other extensions for OS specific purposes.

A convenient interface to RDTSC could be provided as “integer(8) Function RDTSC_Ticks ()” and possibly “integer(8) Function RDTSC_Rate ()” as a compiler extension, so that all the handler baggage does not need to be repeated.

At the moment, QueryPerformanceCounter is the most accurate that is easily available via Windows API. ( typically = RDTSC / 1024. )

What’s wrong with using QueryPerformanceCounter? That seems more than adequate for high precision code timing. There is overhead in querying processor registers such as RDTSC, and a lot of uncertainty timing very short sequences on a general purpose OS.

If you think you really need RDTSC, you can write a short C function to get it and call it from Fortran.

I have a general question about these timers on intel hardware. RDTSC in particular returns the raw clock ticks. But the clock cycle can change during a program run for various reasons, e.g. as a way to regulate the internal chip temperature, or it can also be changed by the program in order to boost performance for short times. This means that the conversion from clock ticks to time must account for these rate changes. How is that done? Presumably the system_clock() intrinsic, which also works with clock ticks and clock rates, must also do this kind of conversion if it is using a variable rate clock. I notice in the gfortran documentation that it says it willl fall back and use a real-time clock (presumably with coarser resolution) for a variable clock rate.

Of course, if you are optimizing a section of code, the RDTSC number might be the best number to return anyway. You aren’t really interested in the conversion to wall clock time in that case, you are really trying to minimize clock cycles anyway.

Steve, Thanks for your comments,

QueryPerformanceCounter appears to be based on RDTSC, more commonly QueryPerformanceCounter = RDTSC / 1024_8. Given the range of 8-byte integers, I don’t know why.

Regarding a “short” C function : I have been provided code to initialise and then access RDTSC from the Intel Fortran forum, but I think the level of complexity of this code would push it to being much better to be located in the Fortran libraries, rather than having to maintain this code as compilers change.

There is also the trend to reduce the use of personal libraries of routines. CONTAINS and routines in modules tends to discourage libraries and so include code in each project.

Does the Fortran Standard provide any comment on the accuracy of SYSTEM_CLOCK or CPU_TIME? My experience is most Windows implementation of SYSTEM_CLOCK using default integer and all implementations of CPU_TIME provide inadequate precision.

As Ron has posted, when considering code performance, RDTSC can be the best option for understanding timing.
Ron also brings up an interesting point regarding processor clock cycle variation, due to over-clocking or slow down due to temperature control. Can anyone comment on if these changes also affect the RDTSC rate or is it a fixed rate base clock ? A significant problem for my !$OMP PARALLEL DO timings can be the variation in clock rate between cores / threads.

Although there are some inaccuracies with RDTSC, when available, it does provide the best timing information to assess performance.