Timing of array initialisation

I’m new to Fortran - using gfortran on Linux

Initialising an array using a do loop compiles some 8-10 times faster than an iteration; can someone explain why?

thanks

integer, dimension(1002001) :: buildarray
integer :: i

do i = 1, 1002001
buildarray(i) = i
end do

buildarray = [(i, i=1, 1002001)]

do loop:
time (gfortran -Wall -o test002 test002.f90)
real 0m0.086s
user 0m0.058s
sys 0m0.028s

time (./test002)
real 0m0.010s
user 0m0.002s
sys 0m0.008s

iterate:
time (gfortran -Wall -o test002 test002.f90)
real 0m0.790s
user 0m0.741s
sys 0m0.024s

time (./test002)
real 0m0.014s
user 0m0.003s
sys 0m0.011s

2 Likes

Welcome to the forum!

I take it that your “iterate” case is the one with the implied do-loop (buildarray = …)? Aha, yes, the titles to your timings seem to say so :slight_smile:

While I am not a compiler writer, I can make a guess at the timings: in the second case the program needs to make a temporary array, fill it and then copy it into the target array on the lefthand-side. But since this can be done at compile-time, you probably have a case where the compiler does the heavy work and stores the result in the object file (.o).

Is there a difference in size between the object files you get?

Note: I once wrote a program where just about all compilation could be done by simply filling a data array at compile time. For one compiler I tried, this took ages and only when I reduced the size was my amount of patience later than the burden on that compiler. (Just for your information: it was a program that determined Ramanujan numbers up to 1 million in a single statement)

A question and a comment.

What level of optimization was used for both cases ? I’m not sure how reliable the Linux
system time function is for timing fine grain differences in run time. I’ve observed really large deltas between successive runs due to some other load on the system.

A compiler has basically two approaches that it can take. One is to construct the array at compile time and store the data in static memory in the object file. For a large array, say 10x or 100x larger than in this example, that would produce a huge executable program, and just loading such a program into memory might take several minutes. Another approach would be to defer the array construction until run time. In this case, depending on how clever the compiler is with optimization, space for two such arrays might be required, one for the right hand side and one for the left hand side. Obviously this is a waste of memory, even if the array is filled at run time rather than at compile time.

I think you will find the following approach produces the smallest executable file and also executes the fastest:

integer, parameter :: N=1002001
integer, allocatable :: buildarray(:)
integer :: i
allocate( buildarray(N) )
do i = 1, N
   buildarray(i) = i
enddo

Here the compiler does everything, including the memory allocation, at run time.

1 Like

This is what gfortran seems to do, even with -O3. It dynamically allocates a temp, fills it in, and copies into buildarray.

This is what flang does, with both versions of the initialization. It takes a long time to create the .o file but at runtime it just copies from static memory into the result.

@Arjen, rwmsu, ashe, RonShepard
Thank you all for your replies.

My copy of gfortran is very much out-of-the-box; no optimisations except the standard ones.

The executables are 16.1kb for the do loop version and 16.2kb for the iteration version (if that’s the right term).

Timing variations for multiple compile runs are a few 1000ths of a second for the do loop exe and a few 100ths of a second for the iteration one.

I’m probably deeper in the weeds than I want to be on day 3 of trying Fortran but the difference was so noticeable I had to ask.

@RonShepard thank you for your explicit sample code.

regards

I have noted earlier that code with implied do loops can compile very slowly when the loop range is large, so my advice is to use implied do loops only when the loop range is fixed and small.

1 Like

What we mean is that when benchmarking performances, you should always compile with the compiler optimizations enabled. That is, with gfortran, at least use the -O3 flag.

Do you mean of this particular example, or generally speaking?

I mean in this example. In the generated code there are calls to _malloc, _free, and _memcpy. This is with gfortran 14.2 on macos if it makes a difference.