I take it that your “iterate” case is the one with the implied do-loop (buildarray = …)? Aha, yes, the titles to your timings seem to say so
While I am not a compiler writer, I can make a guess at the timings: in the second case the program needs to make a temporary array, fill it and then copy it into the target array on the lefthand-side. But since this can be done at compile-time, you probably have a case where the compiler does the heavy work and stores the result in the object file (.o).
Is there a difference in size between the object files you get?
Note: I once wrote a program where just about all compilation could be done by simply filling a data array at compile time. For one compiler I tried, this took ages and only when I reduced the size was my amount of patience later than the burden on that compiler. (Just for your information: it was a program that determined Ramanujan numbers up to 1 million in a single statement)
What level of optimization was used for both cases ? I’m not sure how reliable the Linux
system time function is for timing fine grain differences in run time. I’ve observed really large deltas between successive runs due to some other load on the system.
A compiler has basically two approaches that it can take. One is to construct the array at compile time and store the data in static memory in the object file. For a large array, say 10x or 100x larger than in this example, that would produce a huge executable program, and just loading such a program into memory might take several minutes. Another approach would be to defer the array construction until run time. In this case, depending on how clever the compiler is with optimization, space for two such arrays might be required, one for the right hand side and one for the left hand side. Obviously this is a waste of memory, even if the array is filled at run time rather than at compile time.
I think you will find the following approach produces the smallest executable file and also executes the fastest:
integer, parameter :: N=1002001
integer, allocatable :: buildarray(:)
integer :: i
allocate( buildarray(N) )
do i = 1, N
buildarray(i) = i
enddo
Here the compiler does everything, including the memory allocation, at run time.
This is what gfortran seems to do, even with -O3. It dynamically allocates a temp, fills it in, and copies into buildarray.
This is what flang does, with both versions of the initialization. It takes a long time to create the .o file but at runtime it just copies from static memory into the result.
I have noted earlier that code with implied do loops can compile very slowly when the loop range is large, so my advice is to use implied do loops only when the loop range is fixed and small.
What we mean is that when benchmarking performances, you should always compile with the compiler optimizations enabled. That is, with gfortran, at least use the -O3 flag.
I mean in this example. In the generated code there are calls to _malloc, _free, and _memcpy. This is with gfortran 14.2 on macos if it makes a difference.