# How to show the true wall time instead of cpu_time?

Yes, SYSTEM_CLOCK is what you are looking for. Note that it returns the time in a slightly arbitrary unit - clicks per some unit of time. Use the clock_rate to scale it to something useful - the number of clicks per unit of time differs per platform.

1 Like

An alternative would be date_and_time, but for simple calculations of duration, that is probably too complicated.

1 Like

Thank you @plevold @Arjen
Yes I saw some guy in stackoverflow said can use system clock.
John Burkardt, has a data_time_version, one can call this timestamp ( ) before and after some code and estimate the elapsed time or so.

	subroutine timestamp ( )

!*****************************************************************************80
!
!! TIMESTAMP prints the current YMDHMS date as a time stamp.
!
!  Example:
!
!    31 May 2001   9:45:54.872 AM
!
!  Licensing:
!
!
!  Modified:
!
!    18 May 2013
!
!  Author:
!
!    John Burkardt
!
!  Parameters:
!
!    None
!
implicit none

character ( len = 8 ) ampm
integer ( kind = 4 ) d
integer ( kind = 4 ) h
integer ( kind = 4 ) m
integer ( kind = 4 ) mm
character ( len = 9 ), parameter, dimension(12) :: month = (/ &
'January  ', 'February ', 'March    ', 'April    ', &
'May      ', 'June     ', 'July     ', 'August   ', &
'September', 'October  ', 'November ', 'December ' /)
integer ( kind = 4 ) n
integer ( kind = 4 ) s
integer ( kind = 4 ) values(8)
integer ( kind = 4 ) y

call date_and_time ( values = values )

y = values(1)
m = values(2)
d = values(3)
h = values(5)
n = values(6)
s = values(7)
mm = values(8)

if ( h < 12 ) then
ampm = 'AM'
else if ( h == 12 ) then
if ( n == 0 .and. s == 0 ) then
ampm = 'Noon'
else
ampm = 'PM'
end if
else
h = h - 12
if ( h < 12 ) then
ampm = 'PM'
else if ( h == 12 ) then
if ( n == 0 .and. s == 0 ) then
ampm = 'Midnight'
else
ampm = 'AM'
end if
end if
end if

write ( *, '(i2.2,1x,a,1x,i4,2x,i2,a1,i2.2,a1,i2.2,a1,i3.3,1x,a)' ) &
d, trim ( month(m) ), y, h, ':', n, ':', s, '.', mm, trim ( ampm )

return
end


I discovered a fun difference between cpu_time() and system_clock() recently: if you read a keyboard input with read(*,*) the process is idle while it is waiting for the input, and cpu_time() is frozen.

2 Likes

A few years ago, I compared time measurement subroutines in Fortran. I recommend date_and_time or omp_get_wtime (OpenMP’s time measurement function) as time measurement subroutines.
For system_clock, the count rate is different depending on the compiler. Especially for PGI (currently nvfortran), the upper limit of measurement time is about 36 minutes when integer(int32) is used for the count.
Also, system_clock and cpu_time cannot measure the execution time of a parallelized program.
The table below shows the measurement results when calculating the addition of 2^{13} \times 2^{13} 2d arrays (c(:,:)=a(:,:)+b(:,:)) parallelized by OpenMP.

date_and_time is the best in viewpoints of the accuracy and portability. omp_get_wtime is the second best.

compiler subroutine execution time [s]

Intel system_clock 0.113 0.069 0.048
cpu_time 0.109 0.115 0.109
date_and_time 0.114 0.060 0.029
omp_get_wtime 0.112 0.057 0.027
PGI system_clock 0.112 0.063 0.050
cpu_time 0.111 0.057 0.028
date_and_time 0.112 0.057 0.028
omp_get_wtime 0.111 0.057 0.028
GNU system_clock 0.113 0.063 0.045
cpu_time 0.109 0.094 0.031
date_and_time 0.113 0.058 0.029
omp_get_wtime 0.113 0.057 0.028

Intel: Intel Parallel Studio XE Composer Edition for Fortran 17.0.4.210
PGI: PGI Visual Fortran for Windows 18.7
GNU: gfortran 7.3.0 (for Ubuntu on WSL)

5 Likes

What is the practical limit with integer(int64)?

A 64 bits integer can go up to 2**63-1 = +9223372036854775807
Even if the clock resolution is nanosecond, we could count 9223372036 seconds ~ 106751 days ~ 292 years.

The GFortran doc says:

COUNT_RATE is system dependent and can vary depending on the kind of the arguments. For kind=4 arguments (and smaller integer kinds), COUNT represents milliseconds, while for kind=8 arguments (and larger integer kinds), COUNT typically represents micro- or nanoseconds depending on resolution of the underlying platform clock. COUNT_MAX usually equals HUGE(COUNT_MAX) . Note that the millisecond resolution of the kind=4 version implies that the COUNT will wrap around in roughly 25 days. In order to avoid issues with the wrap around and for more precise timing, please use the kind=8 version.

1 Like

What is the practical limit with integer(int64) ?

For the compilers I used, practical upper limits with integer(64) are shown below:

compiler upper limit count rate
Intel 290000 years 1000000
PGI 29000 years 10000000
GNU 290 years 1000000000

I agree with what gfortran docs says.
I NOW think that system_clock should be used with integer (int64).
My previous post is a pitfall I fell into when I didn’t know that.

2 Likes

Aside from using date and time related libraries and modules as enumerated at the Fortran Wiki and several stopwatch modules that are available, a (relatively) simple example code using DATE_AND_TIME() that contains a re-useable procedure for converting DATE_AND_TIME() values to a Julian date that is useful for timings up to the precision of DATE_AND_TIME():

program demo_date_to_julian
implicit none
! type for unix epoch time and julian days
integer,parameter :: realtime=kind(0.0d0)
integer             :: dat(8)
real(kind=realtime) :: juliandate(2)
character           :: paws
integer             :: ierr,ios,i
do i=1,2
call date_and_time(values=dat)
! convert DAT to Julian Date
call date_to_julian(dat,juliandate(i),ierr)
write(*,*)'Julian Date is ',juliandate(i)
write(*,*)'ierr is ',ierr
if(i.eq.1)then
endif
enddo
write(*,*)'Delta',juliandate(2)-juliandate(1)
contains
subroutine date_to_julian(dat,julian,ierr)
! Convert proleptic Gregorian DAT date-time array to Julian Date

! array like returned by DATE_AND_TIME(3f)
integer,intent(in)               :: dat(8)
! Julian Date (non-negative, but may be non-integer)
real(kind=realtime),intent(out)  :: julian
! Error return: 0 =successful execution,-1=invalid year,-2=invalid month,
! -3=invalid day -4=invalid date (29th Feb, non leap-year)
integer,intent(out)              :: ierr
integer                          :: year, month, day, utc, hour, minute
real(kind=realtime)              :: second
integer                          :: A, Y, M, JDN
year   = dat(1)
month  = dat(2)
day    = dat(3)
utc    = dat(4)*60 ! Delta from UTC, convert from minutes to seconds
hour   = dat(5)
minute = dat(6)
second = dat(7)-utc+dat(8)/1000.0d0 ! correct for time zone and milliseconds
! and IERR is < 0
if(year==0 .or. year .lt. -4713) then
julian = -HUGE(99999_realtime) ! this is the date if an error occurs
ierr=-1
return
endif
!  You must compute first the number of years (Y) and months (M) since
!  March 1st -4800 (March 1, 4801 BC)
A=(14-month)/12 ! A will be 1 for January or February, and 0 for
! other months, with integer truncation
Y=year+4800-A
M=month+12*A-3  ! M will be 0 for March and 11 for February
!  All years in the BC era must be converted to astronomical years,
!  so that 1BC is year 0, 2 BC is year "-1", etc.
!  Convert to a negative number, then increment towards zero
!  Staring from a Gregorian calendar date
!  intentional integer truncation
JDN=day + (153*M+2)/5 + 365*Y + Y/4 - Y/100 + Y/400 - 32045
!  Finding the Julian Calendar date given the JDN (Julian day number)
!  and time of day
julian=JDN + dble(hour-12)/24.0d0 + dble(minute)/1440.0d0 + second/86400.0d0
if(julian.lt.0.d0) then                  ! Julian Day must be non-negative
ierr=1
else
ierr=0
endif
end subroutine date_to_julian
end program demo_date_to_julian

2 Likes

Any thought on why omp_get_wtime timing is so different from system_clock? Do you have the benchmarks available to the public? thanks.

Do you have the benchmarks available to the public?

Details are posted in this blog post (written in Japanese) in 2018.
I uploaded the source file used for the measurement to GitHub.

Any thought on why omp_get_wtime timing is so different from system_clock ?

Sorry, I am not sure in detail. Parallelization with OpenMP (forking and joining threads) may have some effect.

1 Like

I had more luck with omp_get_wtime, I implemented a function clock() like this: hfsolver/utils.f90 at b4c50c1979fb7e468b1852b144ba756f5a51788d · certik/hfsolver · GitHub.

As a user, honestly the only thing I ever needed was the total actual time. Whether it runs parallel with openmp, threads or MPI. I just want total time and then I do the math myself if I need something else.

Fortran should have a reliable function to do just that.

What are the use cases and motivation that cpu_time() does not return the total time with openmp?

3 Likes

With all timers, such as SYSTEM_CLOCK or CPU_TIME, or other operating system options available, they all have a nominal accuracy (eg clock ticks per second), but importantly they also have a refresh rate. This depends on how frequently the clock tick COUNT is updated.
With Windows, some timers are only updated 64 times per second, eg CPU_TIME or the system routine GetTickCount, while others are updated more frequently, eg QueryPerformanceCounter.
DATE_AND_TIME can also update only 64 times per second, even though VALUE(8) is in microseconds. It all depends on the operating system and which system timer the compiler adopts.
With gFortran on Windows, the effective refresh rate of SYSTEM_CLOCK depends if a 4-byte integer or 8-byte integer is used to obtain the clock ticks or clock_rate.
Other operating systems have their features, but don’t assume that SYSTEM_CLOCK’s “COUNT_RATE” implies that is the clock accuracy.
Poor accuracy clocks can report a zero change in time, which is a good sign of a poor accuracy timer.

5 Likes

When doing testing of performance, one of the most annoying problems is getting access to a high accuracy timer.
Why can’t gFortran provide the following for Windows (or equivalent for other operating systems):
integer8 function rdtsc_ticks()
integer
8 function rdtsc_tick_rate()

I have read lots of excuses, especially related to thread variability, but providing any high accuracy 8-byte tick count for any thread would be a huge improvement on what is now vaguely available.
If there are problems with this timer, then provide some notes on it’s limitations.
Typically rdtsc_tick_rate() is the processor cycle rate ~ 4 gHz, which is much better than any alternative.

2 Likes

@JohnCampbell
After reading your previous post, I wrote interfaces to access a high-resolution performance counter on Windows:

    interface
!! get the current value of the performance counter.
function QueryPerformanceCounter(lPerformanceCount_count) result(is_succeeded) &
bind(c, name="QueryPerformanceCounter")
use, intrinsic :: iso_c_binding
implicit none
integer(c_long_long) :: lPerformanceCount_count
!! current performance-counter value [count]
logical(c_bool) :: is_succeeded
!! nonzero (i.e. .true.) if the function succeeds
end function QueryPerformanceCounter

!! get the frequency of the performance counter.
function QueryPerformanceFrequency(lFrequency_countPerSec) result(is_supported) &
bind(c, name="QueryPerformanceFrequency")
use, intrinsic :: iso_c_binding
implicit none
integer(c_long_long) :: lFrequency_countPerSec
!! current performance-counter frequency [count/sec]<br>
!! nonzero if the hardware running the program supports a high-resolution performance counter.
logical(c_bool) :: is_supported
!! nonzero (i.e. .true.) if the hardware supports a high-resolution performance counter
end function QueryPerformanceFrequency
end interface


Time duration can be measured using QueryPerformanceCounter like below:

    integer(c_long_long) :: time_begin_qhc, time_end_qhc, freq
logical(c_bool) :: is_supported, is_succeeded

is_supported = QueryPerformanceFrequency(freq)
if(is_supported)then
is_succeeded = QueryPerformanceCounter(time_begin_qhc)
! do something
is_succeeded = QueryPerformanceCounter(time_end_qhc)
print *,real(time_end_qhc - time_begin_qhc,real32)/real(freq,real32)
end if


I tested it with the code on my repo, already posted in a reply of this topic. According to the results, QueryPerformanceCounter seems to be working correctly.

@certik

Fortran should have a reliable function to do just that.

What are the use cases and motivation that cpu_time() does not return the total time with openmp?

The repo is a use case.
As you said, I want a simple way to get elapsed time, whether sequential or parallelized programs.

1 Like

Unfortunately, QueryPerformanceCounter is not accurate enough !!
QueryPerformanceFrequency can vary depending on the processor. For recent processors, it can be set to rdtsc_rate() / 1024, which implies a relationship.

It makes no sense that rdtsc_tick() and rdtsc_rate() are not provided.
I am not familiar with what are provided on other O/S, but I would expect that all should have access to the processor clock and be recoverable as an 8-byte integer.

1 Like

Thanks. Can you explain your use case, that you do not want cpu_time to return the total time?

1 Like

The use case computes a sum of two arrays and measures its time.
The code is a part of my teaching material to teach Fortran beginners the time measurement in Fortran. In the material, we will write and measure the execution time of sequential and parallelized programs. So we want to get the actual execution time, not the CPU time.

Since cpu_time returns the total CPU time of the cores used for the computation, the measurement results will be almost the same as the results of the sequential program, even if we succeeded in the parallelization.
This behavior is a huge pitfall, especially for beginners.

To get the actual time, I currently recommend using date_and_time() or omp_get_wtime(). However, date_and_time() requires some extra process to get the time difference in milliseconds, and omp_get_wtime() is not suitable for use in sequential programs that do not actually require OpenMP.
I think your clock function is better for practical use.

1 Like

I see, thanks @tomohirodegawa. If I understand you correctly, what you are describing is exactly what I want. I was asking for a use case where you do not want the behavior you just described.

What I want and what you want (if I understand you correctly) is for cpu_time to return the actual time. So if you run in parallel on 2 cores with ideal speedup, I would like the time that I measure to be half the original time.

2 Likes

The current description of cpu_time is:

CPU_TIME(TIME)

Description. Processor time used.

Class. Subroutine.

Argument. TIME shall be a real scalar. It is an INTENT(OUT) argument. If the processor cannot provide a meaningful value for the time, it is assigned a processor-dependent negative value; otherwise, it is assigned a processor-dependent approximation to the processor time in seconds. Whether the value assigned is an approximation to the amount of time used by the invoking image, or the amount of time used by the whole program, is processor dependent. [emphasis added]

Interestingly, there is a note I did not know about:

A processor for which a single result is inadequate (for example, a parallel processor) might choose to provide an additional version for which time is an array. [emphasis added]

The exact definition of time is left imprecise because of the variability in what different processors are able to provide. The primary purpose is to compare different algorithms on the same processor or discover which parts of a calculation are the most expensive

I haven’t tested any compiler to see if there exists an array version of cpu_time.

3 Likes

Well, just found that our great friend John Burkardt has a wtime() function which uses system_clock, link is below,

https://people.sc.fsu.edu/~jburkardt/f_src/wtime/wtime.html

John Burkardt said in the webpage,

wtime, a FORTRAN90 code which returns a reading of the wall clock time.

For parallel programming, the important thing to measure is the elapsed wallclock time. This can be found by subtracting an initial reading of the wallclock time from a final one.

The OpenMP system provides a function used as follows:

    seconds = omp_get_wtime ( )
operations to time;
seconds = omp_get_wtime ( ) - seconds;


while the MPI system provides a similar function used as:
seconds = MPI_Wtime ( );
operations;
seconds = MPI_Wtime ( ) - seconds;

and in MATLAB, wallclock time can be taken with “tic” and “toc”:
tic;
operation;
seconds = toc;

The code provides a way to get a similar reading:

    seconds = wtime ( );
operations;
seconds = wtime ( ) - seconds;


I have briefly test it with and without openMP, it seems really give the wall (elapsed) time. I may just use it for now.

There is also a similar SO link question,