Is there performance advantage of using intent(in) and intent(out)?

I know that intent(in), intent(out) can be used to prevent some bugs when declaring the arguments in a function or subroutine , but I want to ask,

is there performance advantage of declaring arguments as intent(in) and intent(out)?

Especially, if I put a big array as input, if I specify the array as intent(in), will that somehow prevent temp memory copy stuff?

Thanks!

4 Likes

@CRquantum ,

I strongly suggest a good read first of the book, "Modern Fortran Style and Usage" by Norman S. Clerman and Walter Spector. It addresses thoroughly this aspect and other similar and related questions you will have with Fortran.

This forum can be a good place for discourse once you have gone through such a treatise in a structured manner.

2 Likes

Yes, it can help with optimization and performance, although I think compilers are smart enough to take care of most scenarios without argument intent. Also, if you are writing a function, then it cannot be pure unless all of its arguments are intent(in). Without intent there is the possibility of the function not being inlined when it is possible, thus severely affecting optimization.

But aside from optimization, I think readability and code maintenance absolutely requires intent of dummy arguments.

4 Likes

I think there can be. But it would depend on the compiler how it works inside.

1 Like

Unfortunately, the book you recommended did not seem to answer my question.

I read that book you mentioned from page 50, as well as other stuff, I know intent(in)/(out) can add some protection to the arguments, and people say it is good habit to use intent and blablabla.

But my question is, is there performance difference?

From my experience it seems I am not able to find much difference. But I would like to know under what condition there could be differences.

Looks like you have gone through many “treatise” already, therefore you much be a Fortran grandmaster.

I suppose this could be such a trivial question for you, so, would you mind answering the question from a grandmaster point of view? Thanks!

Thank you! Looks like an excellent answer.

But may I ask, why the loop is eliminated?

Do you mean the compiler automatically know that

res = sin(100.0)

and, without x being intent(in), the compiler just cannot realize this?

Anyway, that is interesting! Thank you!

1 Like

As others have said, this is a question for each compiler vendor. For the NAG compiler (which produces C and uses a C compiler), I would have to look carefully before making definite statements but it’s unlikely to produce much benefit.

Intent(out) means that the value of the actual argument is discarded (it becomes undefined) and it is defined anew in the procedure. A compiler seeing “call s(x);call t(x)” could produce code for s that completely eliminates the assignment to x since the computed value is discarded.

It might help to mentally inline the procedure and see Intents as promises to the compiler that short-circuit the analysis that it has to do.

Intent(In) is then a promise that a data entity does not change during a basic block. That means that the compiler can reuse a previously squirrelled-away value in a register, or reorder the computation in some way or avoid computation altogether.

Intents are, in the inlined view, a set of compiler directives that allow (but do not mandate) performance improvements.

4 Likes

You cannot prevent copying. All you can do is give no excuse to the compiler to copy stuff that doesn’t need copying.

These are the pertinent questions: how is your dummy array declared, what is the actual argument and how is it declared if it’s a variable and how are you calling the procedure (implicit or explicit interface)?

2 Likes

In a previous thread @ivanpribec said

In the past I’ve added the pure specifier and intent attributes to a multiphase fluid simulation code (AMR-LBM-OpenMP-2D ), and remember seeing a 20-25 % performance increase (I did perform some minor restructuring, but in no way did I modify the algorithm).

1 Like

I have the original code and changes I made all stored in a git repository. I guess it would be good material for a blog write-up backed with evidence.

4 Likes

Hi @CRquantum, and welcome to the Fortran Discourse.

I think @FortranFan was just trying to help by providing the book reference.

To clarify as an administrator and on behalf of Fortran Discourse, you and all users are absolutely welcome to ask any questions and participate in all discussions before or without having read any books or tutorials. And we will help guide everyone toward relevant tutorials, books, or other learning resources, when asked for.

Thank you for your great question and all other discussions! I learned some new things from reading this thread.

6 Likes

Sometimes compilers get confused by specifying array bounds. For example, I have repeatedly observed that the Intel Fortran compiler creates a copy for a whole section array Array(1:n,-1:3) thinking that it is not a contiguous memory section, even though the array bound for the first dimension is 1:n. GFortran appears to be smart enough to notice this and not create a temporary array. For Intel, the remedy is to remove explicit bound specification by passing the array section as Array(:,-1:3)

1 Like

Thank you. I am sorry for being a little aggressive.

1 Like

Thanks! These questions got me thinking some things I had not thought much about before, such as how useful would it be for the compiler to tell you the optimizations it did via reformatted code (some compilers do have switches to tell you some optimizations they do, but the most familiar ones do it in the form of terse messages in general), as reading the intermediate files is not something someone new to programming will probably enjoy (although I think it should be encouraged more than it is).

So instead of “just” (using the phrase lightly) reformatting the code
Would a “mentor” option on a compiler or tool that read your code and added INTENT, PURE|IMPURE ELEMENTAL attributes and other information it is determining to make optimizations as feedback that the compiler is doing be valuable?

That is, the compilers can do a lot of good optimizations, especially of bad code, but could/should they rewrite the code at a higher level as well (ie. show you better Fortran). The compilers contain a wealth of information on what optimizations can be made. I can see why some of that might be proprietary, and I have seen listings options in the past that showed where some major optimizations were performed, but I have not seen that lately. Are there still such utilities available?

So so far I had found compilers were so good that they were catching a lot of the optimizations already that INTENT potentially supplies(the original question); but were not as good at catching the bugs that INTENT would prevent – reinforcing the idea that specifying INTENT is still valuable; but if the compilers had reasonable optimization levels enabled that hand-specifying INTENT was not changing speed much on the stuff I tried.

Although eliminating dead code, unrolling, and doing optimizations such as the above examples and removing calculations that are invariant in the context of a loop out of the loop and on and on would still be nice to see as rewritten Fortran, I was not seeing any significant performance changes when adding explicit declarations that I could not also get out of using higher optimization levels.

I had started with the example code above and made copies with different INTENTs and timed them using different compiler optimizations and switches. Even without looking at the S files that was more informative and fun that I expected. So stage I was just running

program timeit
implicit none
real :: res
call printtime(100.0,res,foo1)
call printtime(100.0,res,foo2)
call printtime(100.0,res,foo3)
call printtime(100.0,res,foo4)
contains
subroutine printtime(x,res,sub)
real :: x, res
real :: start, finish
external sub
call cpu_time(start)
call sub(x,res)
call cpu_time(finish)
write(*,*)res
! writes processor time taken by the piece of code.
print '("Processor Time = ",f6.3," seconds.")',finish-start
end subroutine printtime
subroutine foo1 (x, res)
real, intent(in) :: x
real, intent(out) :: res
integer i
   do i = 1, 100000000
      res = sin(x)
   end do
end subroutine foo1
subroutine foo2 (x, res)
real             :: x
real             :: res
integer i
   do i = 1, 100000000
      res = sin(x)
   end do
end subroutine foo2
subroutine foo3 (x, res)
real, intent(in) :: x
real              :: res
integer i
   do i = 1, 100000000
      res = sin(x)
   end do
end subroutine foo3
subroutine foo4 (x, res)
real              :: x
real, intent(out) :: res
integer i
   do i = 1, 100000000
      res = sin(x)
   end do
end subroutine foo4
end program timeit

and then replacing the trivial routine with some bigger ones I had (via INCLUDE, just hand-coding the declarations). The result was that
with compilers I tried I found I got about the same speeds from each routine in any single build, although they varied wildly from >1 second to < 0.000 sec (sorta as expected, given the giant redundant loop) with the initial example, by using different compiler switches, which basically shows that if -Ofast runs a lot faster than -O0 you probably did something really wrong in the code :>.

The only surprise was in one case the routine with INTENT specified did 50% worse than the others; that is worth looking at the S files, as it might be worth reporting to the compiler developers as intuitively that was unexpected.

So then that got me to taking a old large numeric library, and via some scripts and compiler messages updating it (it has a LOT of units tests available which let me play loosely with it).
The earlier experiments were convincing me there would not be any optimization, so I was taking a real case and using it to prove it was not worth doing this “mentor” recoding approach.

The kludge was basically to give everything starting at the bottom of the calling tree and working up INTENTs and then letting the compiler tell me where the mistakes were and following it’s lead; it took some playing but I ended up pretty quickly with intent specified everywhere (it had not been specified anywhere in the code before (basically pre-F90 code except for having been made free-format)( and from what I saw in the earlier tests I expected very little if any speed-up (there were several in-between iterations leading up to this I am skipping); as this was code that was known to be well-optimized and had been used a lot for a long time; and low and behold without other changes I am seeing an 8% speed up, which makes me want to look at just what the compiler did, but the messages I could get seem pretty similiar between the two code versions, so at least with this compiler looking at the intermediate files might give me the clue but the compiler was not giving away secrets easily; but that got me thinking some automated way of doing this might be more rewarding than I was concluding after all. If I get around to doing something more substantial I’ll report it back here but in the meantime thanks for the interesting discussion. It got me asking a few questions I have not pursued in a long time. The results are intriguing so far, at least to me.

And so back to the original question – I did some actual tests with and without INTENT specified and saw little or no performance difference in most simple cases. I was then surprised that when I did the same to a large code (expecting confirmation that explicit INTENT was not going to change performance at higher optimization levels) I saw a significant improvement interesting enough to track down further (the library I did the test on is actively used). I detailed some of the steps in the hope others are trying something similiar, and maybe the (admittedly helter-skelter) steps I took might ease the journey. If I get time to figure out what the compiler figured out only because INTENT was explicit I’ll try to post it back here.

8 Likes

Thanks for sharing your experiences.

An automated tool that “unannotated” Fortran codes by removing INTENTs and PURE attributes could be interesting. In some cases the resulting codes would still be valid, and one could compare the speed of the original and unannotated codes. A function that is not declared PURE cannot be called within a FORALL block, and an INTENT(OUT) ALLOCATABLE array starts out unallocated, so not all codes will compile or give the same results without annotations, but speeds could be compared for the codes that do. Some people who are writing essentially Fortran 77 (including the Lapack developers) could be motivated to use modern Fortran features if they thought doing so would increase speed.

1 Like

Great idea. A lot easier to remove than add. I have some libraries that fit that bill that I have performance data for. That should be interesting. I have a hunch the difference could be bigger for pre-f90 libraries with routines compiled separately where the compiler is not free to make as many optimizations as well.

That’s exactly our plan for LFortran. This is something I’ve always missed as a user.

3 Likes

Thanks for this really interesting thread. There’s some good info in here that I’ve been struggling to piece together from other sources. The TL;DR as far as I can tell is that there might be an performance benefit to using intent, and seeing as it’s good practice anyway, then there’s no reason not to.

I’ve got a large Fortran model that doesn’t rigorously use intent at the moment. I’m going to change that so it will be a good opportunity to see how much of a difference using intent can make in a “real world” program.

I see the reasoning (it’s a hint that you can re-express your calculations so that a less-optimizing compiler would produce faster code) but question the payoff. “Why keep a dog and bark yourself?”, as they say. Re-expressing takes time and will probably make the code less readable. Also, “-Ofast” might do sufficiently different computations that render the results untrustworthy. Are you sure you know what “-Ofast” gets up to?

Has this code been run with the NAG Fortran Compiler under maximum checking? This could be a symptom of a violation of the Fortran standard. My advice is NEVER attempt to optimize code that doesn’t conform to the Standard.

Perhaps the most performance benefit of INTENT comes when inlining and interprocedural optimization is enabled. I believe the HPE/Cray compiler may be the most advanced in that regard.

3 Likes

This sounds like a paper worth submitting to ACM Fortran Forum. @rouson , don’t you think so?