Thanks! These questions got me thinking some things I had not thought much about before, such as how useful would it be for the compiler to tell you the optimizations it did via reformatted code (some compilers do have switches to tell you some optimizations they do, but the most familiar ones do it in the form of terse messages in general), as reading the intermediate files is not something someone new to programming will probably enjoy (although I think it should be encouraged more than it is).
So instead of “just” (using the phrase lightly) reformatting the code
Would a “mentor” option on a compiler or tool that read your code and added INTENT, PURE|IMPURE ELEMENTAL attributes and other information it is determining to make optimizations as feedback that the compiler is doing be valuable?
That is, the compilers can do a lot of good optimizations, especially of bad code, but could/should they rewrite the code at a higher level as well (ie. show you better Fortran). The compilers contain a wealth of information on what optimizations can be made. I can see why some of that might be proprietary, and I have seen listings options in the past that showed where some major optimizations were performed, but I have not seen that lately. Are there still such utilities available?
So so far I had found compilers were so good that they were catching a lot of the optimizations already that INTENT potentially supplies(the original question); but were not as good at catching the bugs that INTENT would prevent – reinforcing the idea that specifying INTENT is still valuable; but if the compilers had reasonable optimization levels enabled that hand-specifying INTENT was not changing speed much on the stuff I tried.
Although eliminating dead code, unrolling, and doing optimizations such as the above examples and removing calculations that are invariant in the context of a loop out of the loop and on and on would still be nice to see as rewritten Fortran, I was not seeing any significant performance changes when adding explicit declarations that I could not also get out of using higher optimization levels.
I had started with the example code above and made copies with different INTENTs and timed them using different compiler optimizations and switches. Even without looking at the S files that was more informative and fun that I expected. So stage I was just running
program timeit
implicit none
real :: res
call printtime(100.0,res,foo1)
call printtime(100.0,res,foo2)
call printtime(100.0,res,foo3)
call printtime(100.0,res,foo4)
contains
subroutine printtime(x,res,sub)
real :: x, res
real :: start, finish
external sub
call cpu_time(start)
call sub(x,res)
call cpu_time(finish)
write(*,*)res
! writes processor time taken by the piece of code.
print '("Processor Time = ",f6.3," seconds.")',finish-start
end subroutine printtime
subroutine foo1 (x, res)
real, intent(in) :: x
real, intent(out) :: res
integer i
do i = 1, 100000000
res = sin(x)
end do
end subroutine foo1
subroutine foo2 (x, res)
real :: x
real :: res
integer i
do i = 1, 100000000
res = sin(x)
end do
end subroutine foo2
subroutine foo3 (x, res)
real, intent(in) :: x
real :: res
integer i
do i = 1, 100000000
res = sin(x)
end do
end subroutine foo3
subroutine foo4 (x, res)
real :: x
real, intent(out) :: res
integer i
do i = 1, 100000000
res = sin(x)
end do
end subroutine foo4
end program timeit
and then replacing the trivial routine with some bigger ones I had (via INCLUDE, just hand-coding the declarations). The result was that
with compilers I tried I found I got about the same speeds from each routine in any single build, although they varied wildly from >1 second to < 0.000 sec (sorta as expected, given the giant redundant loop) with the initial example, by using different compiler switches, which basically shows that if -Ofast runs a lot faster than -O0 you probably did something really wrong in the code :>.
The only surprise was in one case the routine with INTENT specified did 50% worse than the others; that is worth looking at the S files, as it might be worth reporting to the compiler developers as intuitively that was unexpected.
So then that got me to taking a old large numeric library, and via some scripts and compiler messages updating it (it has a LOT of units tests available which let me play loosely with it).
The earlier experiments were convincing me there would not be any optimization, so I was taking a real case and using it to prove it was not worth doing this “mentor” recoding approach.
The kludge was basically to give everything starting at the bottom of the calling tree and working up INTENTs and then letting the compiler tell me where the mistakes were and following it’s lead; it took some playing but I ended up pretty quickly with intent specified everywhere (it had not been specified anywhere in the code before (basically pre-F90 code except for having been made free-format)( and from what I saw in the earlier tests I expected very little if any speed-up (there were several in-between iterations leading up to this I am skipping); as this was code that was known to be well-optimized and had been used a lot for a long time; and low and behold without other changes I am seeing an 8% speed up, which makes me want to look at just what the compiler did, but the messages I could get seem pretty similiar between the two code versions, so at least with this compiler looking at the intermediate files might give me the clue but the compiler was not giving away secrets easily; but that got me thinking some automated way of doing this might be more rewarding than I was concluding after all. If I get around to doing something more substantial I’ll report it back here but in the meantime thanks for the interesting discussion. It got me asking a few questions I have not pursued in a long time. The results are intriguing so far, at least to me.
And so back to the original question – I did some actual tests with and without INTENT specified and saw little or no performance difference in most simple cases. I was then surprised that when I did the same to a large code (expecting confirmation that explicit INTENT was not going to change performance at higher optimization levels) I saw a significant improvement interesting enough to track down further (the library I did the test on is actively used). I detailed some of the steps in the hope others are trying something similiar, and maybe the (admittedly helter-skelter) steps I took might ease the journey. If I get time to figure out what the compiler figured out only because INTENT was explicit I’ll try to post it back here.