How to show the true wall time instead of cpu_time?

Deal all,
A quick question, I want to see how long the subroutine XXX took, so should be t2-t1 below,

call cpu_time(t1)
call XXX()
call cpu_time(t2)
write(6,*) 'time = ', t2-t1

But, when I use intel fortran, with /qparallel flag, I found sometime the cpu_time no longer tell the true wall time. Eg, it actually took 1 second wall time, but with /qparallel, t2-t2 can show 10 seconds, obviously it show all the cpu_time involved. But I only want the true wall time

So, is there a way to show the true wall time of XXX?

I know I can use mpi_walltime, but I did not want to bother load the mpi module for a program which does not use mpi.

Thank you very much in advance!

1 Like

I think system_clock might be what you’re looking for: SYSTEM_CLOCK (The GNU Fortran Compiler)

Haven’t tried it myself though. Maybe somebody else has experience with it?

1 Like

Yes, SYSTEM_CLOCK is what you are looking for. Note that it returns the time in a slightly arbitrary unit - clicks per some unit of time. Use the clock_rate to scale it to something useful - the number of clicks per unit of time differs per platform.

1 Like

An alternative would be date_and_time, but for simple calculations of duration, that is probably too complicated.

1 Like

Thank you @plevold @Arjen
Yes I saw some guy in stackoverflow said can use system clock.
John Burkardt, has a data_time_version, one can call this timestamp ( ) before and after some code and estimate the elapsed time or so.

	subroutine timestamp ( )

	!*****************************************************************************80
	!
	!! TIMESTAMP prints the current YMDHMS date as a time stamp.
	!
	!  Example:
	!
	!    31 May 2001   9:45:54.872 AM
	!
	!  Licensing:
	!
	!    This code is distributed under the GNU LGPL license.
	!
	!  Modified:
	!
	!    18 May 2013
	!
	!  Author:
	!
	!    John Burkardt
	!
	!  Parameters:
	!
	!    None
	!
	implicit none

	character ( len = 8 ) ampm
	integer ( kind = 4 ) d
	integer ( kind = 4 ) h
	integer ( kind = 4 ) m
	integer ( kind = 4 ) mm
	character ( len = 9 ), parameter, dimension(12) :: month = (/ &
	'January  ', 'February ', 'March    ', 'April    ', &
	'May      ', 'June     ', 'July     ', 'August   ', &
	'September', 'October  ', 'November ', 'December ' /)
	integer ( kind = 4 ) n
	integer ( kind = 4 ) s
	integer ( kind = 4 ) values(8)
	integer ( kind = 4 ) y

	call date_and_time ( values = values )

	y = values(1)
	m = values(2)
	d = values(3)
	h = values(5)
	n = values(6)
	s = values(7)
	mm = values(8)

	if ( h < 12 ) then
	ampm = 'AM'
	else if ( h == 12 ) then
	if ( n == 0 .and. s == 0 ) then
		ampm = 'Noon'
	else
		ampm = 'PM'
	end if
	else
	h = h - 12
	if ( h < 12 ) then
		ampm = 'PM'
	else if ( h == 12 ) then
		if ( n == 0 .and. s == 0 ) then
		ampm = 'Midnight'
		else
		ampm = 'AM'
		end if
	end if
	end if

	write ( *, '(i2.2,1x,a,1x,i4,2x,i2,a1,i2.2,a1,i2.2,a1,i3.3,1x,a)' ) &
	d, trim ( month(m) ), y, h, ':', n, ':', s, '.', mm, trim ( ampm )

	return
	end

I discovered a fun difference between cpu_time() and system_clock() recently: if you read a keyboard input with read(*,*) the process is idle while it is waiting for the input, and cpu_time() is frozen.

2 Likes

A few years ago, I compared time measurement subroutines in Fortran. I recommend date_and_time or omp_get_wtime (OpenMP’s time measurement function) as time measurement subroutines.
For system_clock, the count rate is different depending on the compiler. Especially for PGI (currently nvfortran), the upper limit of measurement time is about 36 minutes when integer(int32) is used for the count.
Also, system_clock and cpu_time cannot measure the execution time of a parallelized program.
The table below shows the measurement results when calculating the addition of 2^{13} \times 2^{13} 2d arrays (c(:,:)=a(:,:)+b(:,:)) parallelized by OpenMP.

date_and_time is the best in viewpoints of the accuracy and portability. omp_get_wtime is the second best.

compiler subroutine execution time [s]
1 thread

2 threads

4 threads
Intel system_clock 0.113 0.069 0.048
cpu_time 0.109 0.115 0.109
date_and_time 0.114 0.060 0.029
omp_get_wtime 0.112 0.057 0.027
PGI system_clock 0.112 0.063 0.050
cpu_time 0.111 0.057 0.028
date_and_time 0.112 0.057 0.028
omp_get_wtime 0.111 0.057 0.028
GNU system_clock 0.113 0.063 0.045
cpu_time 0.109 0.094 0.031
date_and_time 0.113 0.058 0.029
omp_get_wtime 0.113 0.057 0.028

Intel: Intel Parallel Studio XE Composer Edition for Fortran 17.0.4.210
PGI: PGI Visual Fortran for Windows 18.7
GNU: gfortran 7.3.0 (for Ubuntu on WSL)

5 Likes

What is the practical limit with integer(int64)?

A 64 bits integer can go up to 2**63-1 = +9223372036854775807
Even if the clock resolution is nanosecond, we could count 9223372036 seconds ~ 106751 days ~ 292 years.

The GFortran doc says:

COUNT_RATE is system dependent and can vary depending on the kind of the arguments. For kind=4 arguments (and smaller integer kinds), COUNT represents milliseconds, while for kind=8 arguments (and larger integer kinds), COUNT typically represents micro- or nanoseconds depending on resolution of the underlying platform clock. COUNT_MAX usually equals HUGE(COUNT_MAX) . Note that the millisecond resolution of the kind=4 version implies that the COUNT will wrap around in roughly 25 days. In order to avoid issues with the wrap around and for more precise timing, please use the kind=8 version.

1 Like

What is the practical limit with integer(int64) ?

For the compilers I used, practical upper limits with integer(64) are shown below:

compiler upper limit count rate
Intel 290000 years 1000000
PGI 29000 years 10000000
GNU 290 years 1000000000

I agree with what gfortran docs says.
I NOW think that system_clock should be used with integer (int64).
My previous post is a pitfall I fell into when I didn’t know that.

2 Likes

Aside from using date and time related libraries and modules as enumerated at the Fortran Wiki and several stopwatch modules that are available, a (relatively) simple example code using DATE_AND_TIME() that contains a re-useable procedure for converting DATE_AND_TIME() values to a Julian date that is useful for timings up to the precision of DATE_AND_TIME():

program demo_date_to_julian
implicit none
! type for unix epoch time and julian days
integer,parameter :: realtime=kind(0.0d0)  
integer             :: dat(8)
real(kind=realtime) :: juliandate(2)
character           :: paws
integer             :: ierr,ios,i
    do i=1,2
       call date_and_time(values=dat)
       ! convert DAT to Julian Date
       call date_to_julian(dat,juliandate(i),ierr)
       write(*,*)'Julian Date is ',juliandate(i)
       write(*,*)'ierr is ',ierr
       if(i.eq.1)then
          write(*,'(a)',advance='no')'enter [RETURN] to continue ...'
          read(*,'(a)',iostat=ios)paws
       endif
    enddo
    write(*,*)'Delta',juliandate(2)-juliandate(1)
contains
subroutine date_to_julian(dat,julian,ierr)
! Convert proleptic Gregorian DAT date-time array to Julian Date

! array like returned by DATE_AND_TIME(3f)
integer,intent(in)               :: dat(8) 
! Julian Date (non-negative, but may be non-integer)
real(kind=realtime),intent(out)  :: julian  
! Error return: 0 =successful execution,-1=invalid year,-2=invalid month,
! -3=invalid day -4=invalid date (29th Feb, non leap-year)
integer,intent(out)              :: ierr    
integer                          :: year, month, day, utc, hour, minute
real(kind=realtime)              :: second
integer                          :: A, Y, M, JDN
   year   = dat(1)                        
   month  = dat(2)                       
   day    = dat(3)                      
   utc    = dat(4)*60 ! Delta from UTC, convert from minutes to seconds
   hour   = dat(5)                     
   minute = dat(6)                       
   second = dat(7)-utc+dat(8)/1000.0d0 ! correct for time zone and milliseconds
                                       ! and IERR is < 0
   if(year==0 .or. year .lt. -4713) then
      julian = -HUGE(99999_realtime) ! this is the date if an error occurs
      ierr=-1
      return
   endif
!  You must compute first the number of years (Y) and months (M) since
!  March 1st -4800 (March 1, 4801 BC)
   A=(14-month)/12 ! A will be 1 for January or February, and 0 for
                   ! other months, with integer truncation
   Y=year+4800-A
   M=month+12*A-3  ! M will be 0 for March and 11 for February
!  All years in the BC era must be converted to astronomical years,
!  so that 1BC is year 0, 2 BC is year "-1", etc.
!  Convert to a negative number, then increment towards zero
!  Staring from a Gregorian calendar date
   !  intentional integer truncation
   JDN=day + (153*M+2)/5 + 365*Y + Y/4 - Y/100 + Y/400 - 32045 
!  Finding the Julian Calendar date given the JDN (Julian day number)
!  and time of day
   julian=JDN + dble(hour-12)/24.0d0 + dble(minute)/1440.0d0 + second/86400.0d0
   if(julian.lt.0.d0) then                  ! Julian Day must be non-negative
      ierr=1
   else
      ierr=0
   endif
end subroutine date_to_julian
end program demo_date_to_julian
2 Likes

Any thought on why omp_get_wtime timing is so different from system_clock? Do you have the benchmarks available to the public? thanks.

Do you have the benchmarks available to the public?

Details are posted in this blog post (written in Japanese) in 2018.
I uploaded the source file used for the measurement to GitHub.

Any thought on why omp_get_wtime timing is so different from system_clock ?

Sorry, I am not sure in detail. Parallelization with OpenMP (forking and joining threads) may have some effect.

1 Like

I had more luck with omp_get_wtime, I implemented a function clock() like this: hfsolver/utils.f90 at b4c50c1979fb7e468b1852b144ba756f5a51788d · certik/hfsolver · GitHub.

As a user, honestly the only thing I ever needed was the total actual time. Whether it runs parallel with openmp, threads or MPI. I just want total time and then I do the math myself if I need something else.

Fortran should have a reliable function to do just that.

What are the use cases and motivation that cpu_time() does not return the total time with openmp?

3 Likes

With all timers, such as SYSTEM_CLOCK or CPU_TIME, or other operating system options available, they all have a nominal accuracy (eg clock ticks per second), but importantly they also have a refresh rate. This depends on how frequently the clock tick COUNT is updated.
With Windows, some timers are only updated 64 times per second, eg CPU_TIME or the system routine GetTickCount, while others are updated more frequently, eg QueryPerformanceCounter.
DATE_AND_TIME can also update only 64 times per second, even though VALUE(8) is in microseconds. It all depends on the operating system and which system timer the compiler adopts.
With gFortran on Windows, the effective refresh rate of SYSTEM_CLOCK depends if a 4-byte integer or 8-byte integer is used to obtain the clock ticks or clock_rate.
Other operating systems have their features, but don’t assume that SYSTEM_CLOCK’s “COUNT_RATE” implies that is the clock accuracy.
Poor accuracy clocks can report a zero change in time, which is a good sign of a poor accuracy timer.

5 Likes

When doing testing of performance, one of the most annoying problems is getting access to a high accuracy timer.
Why can’t gFortran provide the following for Windows (or equivalent for other operating systems):
integer8 function rdtsc_ticks()
integer
8 function rdtsc_tick_rate()

I have read lots of excuses, especially related to thread variability, but providing any high accuracy 8-byte tick count for any thread would be a huge improvement on what is now vaguely available.
If there are problems with this timer, then provide some notes on it’s limitations.
Typically rdtsc_tick_rate() is the processor cycle rate ~ 4 gHz, which is much better than any alternative.

2 Likes

@JohnCampbell
After reading your previous post, I wrote interfaces to access a high-resolution performance counter on Windows:

    interface
        !! get the current value of the performance counter.
        function QueryPerformanceCounter(lPerformanceCount_count) result(is_succeeded) &
                                                                  bind(c, name="QueryPerformanceCounter")
            use, intrinsic :: iso_c_binding
            implicit none
            integer(c_long_long) :: lPerformanceCount_count
                !! current performance-counter value [count]
            logical(c_bool) :: is_succeeded
                !! nonzero (i.e. .true.) if the function succeeds
        end function QueryPerformanceCounter

        !! get the frequency of the performance counter.
        function QueryPerformanceFrequency(lFrequency_countPerSec) result(is_supported) &
                                                                   bind(c, name="QueryPerformanceFrequency")
            use, intrinsic :: iso_c_binding
            implicit none
            integer(c_long_long) :: lFrequency_countPerSec
                !! current performance-counter frequency [count/sec]<br>
                !! nonzero if the hardware running the program supports a high-resolution performance counter.
            logical(c_bool) :: is_supported
                !! nonzero (i.e. .true.) if the hardware supports a high-resolution performance counter
        end function QueryPerformanceFrequency
    end interface

Time duration can be measured using QueryPerformanceCounter like below:

    integer(c_long_long) :: time_begin_qhc, time_end_qhc, freq
    logical(c_bool) :: is_supported, is_succeeded

    is_supported = QueryPerformanceFrequency(freq)
    if(is_supported)then
        is_succeeded = QueryPerformanceCounter(time_begin_qhc)
        ! do something
        is_succeeded = QueryPerformanceCounter(time_end_qhc)
        print *,real(time_end_qhc - time_begin_qhc,real32)/real(freq,real32)
    end if

I tested it with the code on my repo, already posted in a reply of this topic. According to the results, QueryPerformanceCounter seems to be working correctly.

@certik

Fortran should have a reliable function to do just that.

What are the use cases and motivation that cpu_time() does not return the total time with openmp?

The repo is a use case.
As you said, I want a simple way to get elapsed time, whether sequential or parallelized programs.

1 Like

Unfortunately, QueryPerformanceCounter is not accurate enough !!
QueryPerformanceFrequency can vary depending on the processor. For recent processors, it can be set to rdtsc_rate() / 1024, which implies a relationship.

It makes no sense that rdtsc_tick() and rdtsc_rate() are not provided.
I am not familiar with what are provided on other O/S, but I would expect that all should have access to the processor clock and be recoverable as an 8-byte integer.

1 Like

Thanks. Can you explain your use case, that you do not want cpu_time to return the total time?

1 Like

The use case computes a sum of two arrays and measures its time.
The code is a part of my teaching material to teach Fortran beginners the time measurement in Fortran. In the material, we will write and measure the execution time of sequential and parallelized programs. So we want to get the actual execution time, not the CPU time.

Since cpu_time returns the total CPU time of the cores used for the computation, the measurement results will be almost the same as the results of the sequential program, even if we succeeded in the parallelization.
This behavior is a huge pitfall, especially for beginners.

To get the actual time, I currently recommend using date_and_time() or omp_get_wtime(). However, date_and_time() requires some extra process to get the time difference in milliseconds, and omp_get_wtime() is not suitable for use in sequential programs that do not actually require OpenMP.
I think your clock function is better for practical use.

I hope I have understood the intent of your reply.

1 Like

I see, thanks @tomohirodegawa. If I understand you correctly, what you are describing is exactly what I want. I was asking for a use case where you do not want the behavior you just described.

What I want and what you want (if I understand you correctly) is for cpu_time to return the actual time. So if you run in parallel on 2 cores with ideal speedup, I would like the time that I measure to be half the original time.

2 Likes