Fortran 2023 standard and system clock

Shahid · November 26, 2023, 6:14pm

There are some changes to the system clock routine. I still did not get the idea of its implementation in the new standard.

It is said in John Reid’s presentation (The new features of Fortran 2023 ) that Fortran 2018 gives the freedom to use different kinds of integers for arguments. I did check it with Intel’s latest compiler and I see it does throw an error. But it is avoided by using the same kind of integer!

So what will be the change in the system_clock routine? Will we see any error in using different kinds or it is just a recommendation to use the same kind of integers?

msz59 · November 26, 2023, 9:44pm

F2023 explicitly specifies that all integer arguments shall have the same kind. F2018 did not mention that so, in principle, one could try to use different kinds, processor-allowing.

16.9.202 SYSTEM_CLOCK ([COUNT, COUNT_RATE, COUNT_MAX])
...
In a reference to SYSTEM_CLOCK, all integer arguments shall have the same kind type parameter.

JohnCampbell · November 27, 2023, 4:29am

I would expect that this change clarifes that different integer kinds can be used as arguments, but when using multiple optional arguments, all arguments should be of the same integer kind.

Gfortran provides different precision (value of count_rate) depending on the kind of count_rate provided. This is a good thing.

If Intel’s latest compiler does throw an error when mixing integer kinds, that could be a swift implementation of the 2023 standard !
Does Intel’s latest compiler provide different values for count_rate depending on the integer kind used ?
Unfortunately, in the past, the value of “count_rate” was no guarantee of the accuracy of the “count” reported.

sblionel · November 27, 2023, 11:43am

Yes, Intel’s implementation provides different rates for the different integer kinds. The changes in F23 were to disallow mixed kinds, which could lead to unpredictable results, and to alert the reader about the possibility of “different clocks”… There were no syntax changes. See https://j3-fortran.org/doc/year/21/21-117r3.txt for more.

Shahid · November 28, 2023, 2:15am

My tests with the latest intel compiler

1. Same integer kind

program system_clock_test
  implicit none

  integer ( 4 ), parameter :: x = 100 , y = 100
  integer ( 4 ) :: steps = 100000, step
  integer ( 8 ) :: start_count, rate
  integer ( 8 ) :: stop_count
  real ( 8 ), dimension ( :, : ), allocatable :: r, c


  allocate ( r(x,y) , c(x,y) )

  !====
  call system_clock(count=start_count , count_rate=rate)
  !====

  do step = 1, steps
     call random_number ( r )
     r = 0.4**4 + 0.02*( 0.5 - r*0.258 )**2
  end do


  !=====
  call system_clock(count=stop_count)
  !=====

  print*, ' system clock time        : ', real(max(stop_count - start_count , 1_8 )) /real(rate),' seconds'
  print*, ' kind type start_count    : ', kind(start_count)
  print*, ' kind type stop_count     : ', kind(stop_count)


end program system_clock_test

C:\Users\owner\Desktop>>ifx test.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.0.0 Build 20231017
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.
-out:test.exe
-subsystem:console
test.obj

C:\Users\owner\Desktop>> test
  system clock time        :    9.057000      seconds
  kind type start_count    :            8
  kind type stop_count     :            8

2. Different integer kind

program system_clock_test
  implicit none

  integer ( 4 ), parameter :: x = 100 , y = 100
  integer ( 4 ) :: steps = 100000, step
  integer ( 8 ) :: start_count, rate
  integer ( 4 ) :: stop_count
  real ( 8 ), dimension ( :, : ), allocatable :: r, c


  allocate ( r(x,y) , c(x,y) )

  !====
  call system_clock(count=start_count , count_rate=rate)
  !====

  do step = 1, steps
     call random_number ( r )
     r = 0.4**4 + 0.02*( 0.5 - r*0.258 )**2
  end do


  !=====
  call system_clock(count=stop_count)
  !=====

  print*, ' system clock time        : ', real(max(stop_count - start_count , 1_8 )) /real(rate),' seconds'
  print*, ' kind type start_count    : ', kind(start_count)
  print*, ' kind type stop_count     : ', kind(stop_count)


end program system_clock_test

C:\Users\owner\Desktop>ifx test.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.0.0 Build 20231017
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test.exe
-subsystem:console
test.obj

C:\Users\owner\Desktop>test
  system clock time        :   1.0000000E-06  seconds
  kind type start_count    :            8
  kind type stop_count     :            4

JohnCampbell · November 28, 2023, 3:44am

The following code can identify the difference between the precision suggested by “clock_rate” and the actual precision available. This varies greatly with Kind and operating system. Windows and Kind=4 is the worst !

!   test the clock rate and precision for various compilets
!
     integer, parameter :: ip = selected_int_kind ( 10 )   ! try 6 10 or possibly 18
     integer(ip) :: clock_tick, last_tick, clock_rate

     integer*8   :: elapse_ticks(0:10), num_ticks(0:10), calls, ticks
     integer     :: k, i
     real*8      :: sec

     call system_clock ( last_tick, clock_rate )
     elapse_ticks(0) = last_tick

     do k = 0,10
       do i = 1,9999999
         call system_clock ( clock_tick )
         if ( clock_tick == last_tick ) cycle
         elapse_ticks(k) = clock_tick
         num_ticks(k)    = i
         last_tick       = clock_tick
         exit
       end do
     end do

     sec = dble ( elapse_ticks(10)-elapse_ticks(0) ) / dble(clock_rate) / sum(num_ticks)
     write (*,*) 'Integer KIND =',ip
     write (*,*) 'clock_rate =',clock_rate,'  ticks/sec reporting precision'
     write (*,*) 'system_clock call time =',sec,' seconds / call'

     write (*,fmt='(/a)') ' Step     Calls    Change     Seconds'
     do k = 1,10
       calls = num_ticks(k)
       ticks = elapse_ticks(k)-elapse_ticks(k-1)
       sec   = dble ( ticks ) / dble(clock_rate)
       write (*,fmt='(i5,i10,i10, es12.3 )') k, calls, ticks, sec
     end do

     write (*,fmt='(/a)') 'NOTE :'
     write (*,*) ' Calls is the number of repeated calls before "clock_tick" changes value'
     write (*,*) ' Change is the minimum change to clock_tick that can be reported'
     write (*,*) ' Seconds is the true accuracy available'
         
   end

Typically when 4-byte integers are used, the actual precision can be as little as 64 changes per second, while when 8-byte integers are used, a much better precision is available.

The Windows version of Gfortran I have shows this effect, so kind=8 should always be used.

I have not tested Ifort lately, but previously it provided a clock_tick value that changed only 64 times per second, much less than is required.

When an 8-byte integer is used, it should be possible to get values of clock_tick, based on the processor clock, which can be 5 GHz. I am sure if Fortran compiler developers used Fortran timers, this would have been available many years ago.

Typically CPU_TIME also suffers from this poor precision, due to the frequency at which the CPU time accumulator is updated. There is no “KIND” fix I am aware of, as usnig REAL(4) or REAL(8) has the same result.

sblionel · November 28, 2023, 5:21pm

The Intel documentation says:

If the type is INTEGER(1), count_rate is 0. If the type is INTEGER(2), count_rate is 1000. If the type is INTEGER(4) or REAL(4), count_rate is 10000. If the type is INTEGER(8), REAL(8), or REAL(16), count_rate is 1000000.

It is dependent on the OS clock for actual resolution. On Windows, you can do better by using “multimedia timers”.

JohnCampbell · November 29, 2023, 1:26am

@sblionel ,

On Windows, why can’t we get easy access to RDTSC clock ?

It is the best timer for high precision timing and all Windows Fortran compilers should give programmers access to the best timer, even if as a Fortran extension.
There are other extensions for OS specific purposes.

A convenient interface to RDTSC could be provided as “integer(8) Function RDTSC_Ticks ()” and possibly “integer(8) Function RDTSC_Rate ()” as a compiler extension, so that all the handler baggage does not need to be repeated.

At the moment, QueryPerformanceCounter is the most accurate that is easily available via Windows API. ( typically = RDTSC / 1024. )

sblionel · November 29, 2023, 2:34pm

What’s wrong with using QueryPerformanceCounter? That seems more than adequate for high precision code timing. There is overhead in querying processor registers such as RDTSC, and a lot of uncertainty timing very short sequences on a general purpose OS.

If you think you really need RDTSC, you can write a short C function to get it and call it from Fortran.

RonShepard · November 29, 2023, 5:58pm

I have a general question about these timers on intel hardware. RDTSC in particular returns the raw clock ticks. But the clock cycle can change during a program run for various reasons, e.g. as a way to regulate the internal chip temperature, or it can also be changed by the program in order to boost performance for short times. This means that the conversion from clock ticks to time must account for these rate changes. How is that done? Presumably the system_clock() intrinsic, which also works with clock ticks and clock rates, must also do this kind of conversion if it is using a variable rate clock. I notice in the gfortran documentation that it says it willl fall back and use a real-time clock (presumably with coarser resolution) for a variable clock rate.

Of course, if you are optimizing a section of code, the RDTSC number might be the best number to return anyway. You aren’t really interested in the conversion to wall clock time in that case, you are really trying to minimize clock cycles anyway.

JohnCampbell · November 30, 2023, 1:27am

Steve, Thanks for your comments,

QueryPerformanceCounter appears to be based on RDTSC, more commonly QueryPerformanceCounter = RDTSC / 1024_8. Given the range of 8-byte integers, I don’t know why.

Regarding a “short” C function : I have been provided code to initialise and then access RDTSC from the Intel Fortran forum, but I think the level of complexity of this code would push it to being much better to be located in the Fortran libraries, rather than having to maintain this code as compilers change.

There is also the trend to reduce the use of personal libraries of routines. CONTAINS and routines in modules tends to discourage libraries and so include code in each project.

Does the Fortran Standard provide any comment on the accuracy of SYSTEM_CLOCK or CPU_TIME? My experience is most Windows implementation of SYSTEM_CLOCK using default integer and all implementations of CPU_TIME provide inadequate precision.

As Ron has posted, when considering code performance, RDTSC can be the best option for understanding timing.
Ron also brings up an interesting point regarding processor clock cycle variation, due to over-clocking or slow down due to temperature control. Can anyone comment on if these changes also affect the RDTSC rate or is it a fixed rate base clock ? A significant problem for my !$OMP PARALLEL DO timings can be the variation in clock rate between cores / threads.

Although there are some inaccuracies with RDTSC, when available, it does provide the best timing information to assess performance.

Topic		Replies	Views
Proper Usage of `system_clock` Help	9	2315	April 18, 2022
How to show the true wall time instead of cpu_time?	29	8522	October 29, 2023
Are there really any reliable ways to time Fortran code?	18	943	September 27, 2024
Type kind system limitations	10	450	May 6, 2024
Compile time detection of extended-double or quad precision Help	33	1589	November 18, 2022

Fortran 2023 standard and system clock

Related topics