How to show the true wall time instead of cpu_time?

The current description of cpu_time is:

CPU_TIME(TIME)

Description. Processor time used.

Class. Subroutine.

Argument. TIME shall be a real scalar. It is an INTENT(OUT) argument. If the processor cannot provide a meaningful value for the time, it is assigned a processor-dependent negative value; otherwise, it is assigned a processor-dependent approximation to the processor time in seconds. Whether the value assigned is an approximation to the amount of time used by the invoking image, or the amount of time used by the whole program, is processor dependent. [emphasis added]

Interestingly, there is a note I did not know about:

A processor for which a single result is inadequate (for example, a parallel processor) might choose to provide an additional version for which time is an array. [emphasis added]

The exact definition of time is left imprecise because of the variability in what different processors are able to provide. The primary purpose is to compare different algorithms on the same processor or discover which parts of a calculation are the most expensive

I haven’t tested any compiler to see if there exists an array version of cpu_time.

3 Likes

Well, just found that our great friend John Burkardt has a wtime() function which uses system_clock, link is below,

https://people.sc.fsu.edu/~jburkardt/f_src/wtime/wtime.html

John Burkardt said in the webpage,

wtime, a FORTRAN90 code which returns a reading of the wall clock time.

For parallel programming, the important thing to measure is the elapsed wallclock time. This can be found by subtracting an initial reading of the wallclock time from a final one.

The OpenMP system provides a function used as follows:

    seconds = omp_get_wtime ( )
    operations to time;
    seconds = omp_get_wtime ( ) - seconds;

while the MPI system provides a similar function used as:
seconds = MPI_Wtime ( );
operations;
seconds = MPI_Wtime ( ) - seconds;

and in MATLAB, wallclock time can be taken with “tic” and “toc”:
tic;
operation;
seconds = toc;

The code provides a way to get a similar reading:

    seconds = wtime ( );
    operations;
    seconds = wtime ( ) - seconds;

I have briefly test it with and without openMP, it seems really give the wall (elapsed) time. I may just use it for now.

There is also a similar SO link question,

1 Like

So, is there a nice FPM package that implements all of these different options (with a permissive license)? That’s what we need. Everytime I want to time something I just rediscover/reinvent all this. This is another one of those areas where Fortran users are used to rolling their own implementation, when really it should be a standard library and everybody can just use and get on with their lives.

I think that is the expected behavior, right? Also, in a timeshare environment, when your process is swapped out, then you would expect cpu_time() to freeze, while the wall time would continue on. Or if you are accessing a network file system or some remote URL, which freezes for some reason for a while, you would expect cpu_time() to freeze while the wall time would continue.

Another gotcha with timers such as system_clock() is that the tick rate and the resolution can depend on the KIND of the argument. Thus some care is required when timing sections of code to ensure that all the queries use arguments of the same KIND.

1 Like

Fortran timers are always limited by what the underlying hardware provides. There was a time when the only timers available were unsigned 32-bit integers, and the only sampling rate available was the timesharing swap rate, which was about 0.01 seconds on unix systems. Things are better now, most modern hardware provides both faster sampling rates and 64-bit counters. However, there can still be problems mapping that to fortran’s signed INT64. If you only time differences, and the arithmetic is 2’s complement, and if overflows are ignored (yes, all that), then it usually works the way you want it to.

FYI: as regards fpm packages:

I do have an fpm-compatible module based on the StopWatch code

If anyone wants to start one based on some of the ideas here it might be useful. I was going to add something like some of the things here but was waiting to see how conditional compilation worked out, as a lot of it is system-dependent; but not sure if I will ever get to it. i have some simple tic-toc routines I was going to add to it. It is very flexible, especially if you want to leave the code instrumented. I personally tend to use my tic and toc routines more for quick tests and/or profiling tools myself, i admit.

3 Likes

In this respect, providing access to “rdtsc” in a Windows implementation would not be unreasonable.

As processors have improved by a factor of 1,000 over the last 30 years, what once “was about 0.01 seconds” now needs to be 10 micro seconds, which still is 50,000 processor cycles; too course for some tests.

The use of CPU_Time in PGI and GNU for 2 and 4 threads do not look correct to me, as they are expected to report the cumulative cpu usage for all threads.

Hi everyone,

I’ve got a basic question on my mind: Why does the wall time of my non-parallel CFD Fortran model vary each time I run it? I’ve made sure that everything’s set up the same way for each run, but I keep getting different results. For example:

First running: 900s
Second running: 950s
Third running: 850s

The time it takes and the deviations are consistently around 900 seconds, with some wiggle room of about 1-50 seconds. I’ve checked and rechecked to ensure all potential variables are consistent. I’ve even gone so far as to turn off my PC to keep the CPU temperature in check.

I’ve tried different methods like CPU_TIME, SYSTEM_CLOCK, and TIME, but they all lead to the same puzzling outcome. Also I confirm there is no random value generated.

If anyone has any insights on what might be happening here, I’d really appreciate your input.

Cheers,
Robin

1 Like

Fluctuations are to be expected. There are a number of things which could cause them:

  • system jitter (background processes)
  • other processes running
  • CPU temperature
  • thread scheduling (a thread may switch between cores)

To minimize the effects:

  • close other applications and processes (a stray browser or IDE updating in the background could easily occupy lots of resources)
  • don’t do other stuff while your process is running
  • pin your executable to a specific core using taskset or likwid-pin, to prevent context switching

Anecdotally, even something like printing to stdout could slow down your process if it needs to allocate a lot of string buffers dynamically. Reducing the frequency of output or redirecting it to a file might help in this case.

Sometimes you can find the reason behind fluctuations by analyzing their distribution. I can highly recommend watching the following video on designing experiments in high-performance computing:

3 Likes