How does the compiler handle functions and subroutines?

I am trying to modularise my model as much as possible; this helps in understanding the code and allows me to check off aspects that worlk.

Equally, as equations adapt and change over time, it will allow me to update the model relatively easily (I hope).

Yet, does stripping a single subroutine into multiple spearate functions make the program compile into something less efficient?

I envisage that the compiler simply replaces the subroutine call with the relevant code each time, basically recreating the single subroutine in one way or another. Or does it make a memory reference, to which the machine code then has to jump to each time, costing a few cycles in the process?

I have single lines of equation, one function has 20 such equations. Most equations are specialist and complex. I am moving most into their own function, so that I can reference the source, identify all the elements of the function correctly (using meaningful variable names such as Temperature rather than simply t or temp, which could be for temporary). This also means that I now know that this equation works and I need not worry about it in the future (rather than spend ages on a long subroutine, trying to rationalise all the variables and identify, when changes are made, where the error might now be occuring.

And, obviously, I can use the function over and over again.

However, I am interested in the efficiency question; in case my 40 minute long model (for a quick run) might become an hour - thus my long run (3-4 days on a very powerful system) might become 5-10 days or more!

3 Likes

That is the way most compiled languages, including fortran, work.

Sometimes one gains efficiency by breaking up a large subroutine into smaller ones. The large one sometimes uses up all the registers, which inhibits optimization, or uses up all the local cache, and that doesn’t occur in the smaller routines. On the other hand, sometimes there are computed intermediates, which might be scalars or vectors in modern fortran, that might be computed once in the large subroutine but computed repeatedly in the smaller ones. So the answer is to write the code both ways and time it in your actual production runs.

4 Likes

Modern optimizing compilers can also “inline” smaller procedures, providing additional optimization opportunities. My general advice is to write easy-to-understand code, and if that means parceling out complex parts into separate functions, do that. Consider use of contained (internal) functions if the use is only in one source.

Some compilers can optimize separately compiled sources, but that is usually an option you have to select.

Write clearly and see if the perfomance is acceptable. If not, use a profiling tool to see where the time is really spent (it’s usually not where you think it is.)

7 Likes

Premature optimization is the root of all evil

  • Donald Knuth

I’ve had this experience several times now, in both directions. I’ve inherited code written for performance first, and when it turned out to have a bug took days or even weeks to diagnose and fix. But I’ve several times now written new code, optimized first for readability and testability, and when it turned out not to perform well, was easy to tweak. Quite recently I spent a few weeks implementing a green-field project, and once working, took only a couple days to obtain an order of magnitude improvement in performance.

And I can also attest to

8 Likes

As @everythingfunctional and @sblionel said, the first implementation should be for readability, get your code working for your problem. By using Fortran you will get very good performance out of the box that should be enough for a prototype.

If it is possible to tweak your prototype to get the performance you want, you should do that.

In my experience writing high performance code, this is not always possible and a rewrite might be needed. You can use your prototype to understand the requirements, and use it to design new code that will be able to do everything that you need, in a clean manner, that is also performing, from the start. In my experience is it usually impossible to speedup a large existing code to maximum performance (however it is often possible to speed things up, even considerably), you have to write the code from scratch with performance in mind. But you should also not start with focus on performance if you are just trying to write a prototype.

3 Likes

Thanks everyone for your comments. Just to state that I haven’t written this code; I have inherited it. I am making it readable whilst trying to understand it. A code of over 6000 lines and more than 1500 variables, with little commenting and erratic variable naming, mixed with the oddities of F77, has made this an interesting task.

I have recently made significant inroads into this, making it not only readable but, in doing so, identifying how to make it more efficient. These comments have added to my library of tools and helped confirm my chosen approach to this problem.

You might like to check out a tool for analyzing and modernizing Fortran codes: fpt. I have never used it, but it seems like a remarkable bit of engineering from the outside.

@Jcollins on this board is probably the world expert on it.

1 Like

Thank you. I tried it very early on but it didn’t work. Equally, I have a far greater understanding of the model through working through it line by line.
The key here isn’t simply to modernise the code but to make it more stable, more efficient, more readable and update it.
So far I have it running for more time steps (almost doubled and failing now due to the maths failing - not the code) and doing so in less than 50% of the original time.

Well done.

I should state that a lot of the performance gain was from rebuilding my PC, reducing file writing and identifying unnecessary loops early on.
Since then, performance has gone the opposite way, but that’s the cost of readability and understanding.

Thank you for trying fpt. Please can you tell us what didn’t work.
Best wishes,
John

In all honesty, this was almost 2 years ago.

I ran this against the original code I had and cannot remember precisely what the result was, but it didn’t work. I didn’t explore further at the time because this result matched my aim, that I needed to do the work myself, learning how the model itself works.

@garynewport it would be very helpful for @Jcollins if you could report what exactly didn’t work, so that he can improve fpt. He will likely fix fpt so that it works for your code also.

The problem there is that I could not state if the error lay with fpt itself or with my use of it. At that point in time my knowledge on all aspects was limited - the fault, very easily, could have been mine.

The most intriguing point was an equivalence that would not go away (it still exists). I had help from others, which helped introduce new approaches, etc but did not resolve this one, standing issue. In one case, someone helped remove the use of one file that I already knew how to remove, but the one I needed removing remained.

I believe it was prior to this, soon after having the code, that I tried this conversion. With such limited understanding, it is very likely the fault lies with me.

I still have the original source, so happy to try again just to see if I can generate the problem again, or prove that the fault is mine.

1 Like

@Jcollins ,

Will it be possible for you to illustrate how fpt can help modernize a simple code as shown below that makes use of EQUIVALENCE?

The basic premise being that the semantics of EQUIVALENCE can be difficult to fully grasp also given certain legacy nonstandard implementations and that it may be seen as error-prone. Therefore for the exercise here please accept there is desire to move away from it in a refactored version of this existing code.

      SUBROUTINE SUB()
      INCLUDE 'DAT.H'
      REAL A(2), B(2)
      EQUIVALENCE (X(1,1), A(1)), (X(1,2), B(1))
      PRINT *, "IN SUB:"
      PRINT *, "X = ", X
      PRINT *, "A = ", A
      PRINT *, "B = ", B
      END SUBROUTINE
       
      PROGRAM P
      INCLUDE 'DAT.H'
      DATA ((X(J,I), J=1,2),I=1,2) / 1.0, 2.0, 3.0, 4.0 /
      CALL SUB()
      END PROGRAM
  • Include file ‘DAT H’
      REAL X(2,2)
      COMMON / DAT / X
  • Current program behavior
C:\temp>gfortran p.f -o p.exe

C:\temp>p.exe
 IN SUB:
 X =    1.00000000       2.00000000       3.00000000       4.00000000
 A =    1.00000000       2.00000000
 B =    3.00000000       4.00000000

Given the previous comment in this thread, it may help readers understand how fpt accepts code such as above and guides the fpt user to modernize consistently per the current standard facilities.

Thanks,

@garynewport
Thank you - I would be grateful if you could replicate the problem. Please use the latest version of fpt from http://simconglobal.com and request a key. I shall then send a long-term one.

@FortranFan
I understand the desire to remove EQUIVALENCE. The problem is that it is used for several different reasons. For example:
i. To relabel parts of an array. We have several aerospace codes with constructs like:
REAL(kr8) :: state_vec(6), position(3), attitude(3), x,y,z,roll,pitch,yaw
EQUIVALENCE (position,state_vec)
EQUIVALENCE (position(4),attitude)
EQUIVALENCE (x,position(1))
EQUIVALENCE (y,position(2))
and so on.
We could get rid of the equivalenced objects but it wouldn’t make the code any clearer.

ii. Defining packets, often of mixed types and kinds for MPI transfer.
Here we could replace the equivalence blocks with derived types, but without MAP and UNION we might transfer larger packets than we need.

iii. Splitting large integers into components, often when generating code for an attached processor. Again, this could be done with derived types or with TRANSFER. In this case I expect that the resulting code would be much clearer.

iv. Setting up large paged memory structures. We do this in fpt itself. The token stream of the code, the symbol table, statement table, linked list tables and about 20 other tables are split into pages and are mapped onto one very large array. In this way we handle the trade-off between programs with many statements but relatively few variables, and few statements but many variables. This was designed when computers had less memory than today, but to handle programs of several millions of lines it is still very useful.

So, as yet, fpt doesn’t re-code EQUIVALENCE. However, it does have all of the information available internally. I would welcome proposals for a design - what would we like the resulting code to look like? We can set up commands to handle different equivalenced objects in different ways if this will help.

Thank you, that is useful feedback for fpt users to be aware of and that’s likely a relevant issue for OP.

@JCollins
No problem. I will try to do so shortly.
Your later post is interesting, since it might have been that the equivalence I have been trying to kill wasn’t - thus fpt “didn’t work” (sorry, should be quotes, but quotes felt wrong with the word didn’t being there).
It is being eaten away at and will be gone soon, but it is a stubborn beast.

@Jcollins ,

Please note in the case of OP and also in many engineering and scientific computational codes, it is your first case about relabeling parts of an array that prove important. It is also the case that is likely to conform to the standard where the other 3 cases you list are likely involve nonstandard extensions.

Thus in terms of design of fpt, if it can help with your case i, that would go quite far.

Re: “what would we like the resulting code to look like?,” with the silly example above, an end product such as the following would go a long way toward helping fpt users achieve modernization:

module m
   implicit none
   integer, parameter :: N = 2
   real, target :: x(N,N)
contains
   subroutine sub()
      real, pointer :: a(:)
      real, pointer :: b(:)
      a => x(:,1)
      b => x(:,2)
      print *, "IN SUB:"
      print *, "X = ", x
      print *, "A = ", a
      print *, "B = ", b
      a => null()
      b => null()
      return
   end subroutine 
end module
program p
   use m, only : x, sub
   implicit none
   x = reshape( [( real(i), integer :: i = 1, size(x) )], shape=shape(x) )
   call sub()
   stop
end program p
  • Program behavior - note it must be the same as existing code - here it is:
C:\temp>ifort /standard-semantics /free p.f
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.8.0 Build 20221119_000000
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.34.31937.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
 IN SUB:
 X =  1.000000 2.000000 3.000000 4.000000
 A =  1.000000 2.000000
 B =  3.000000 4.000000

You will notice the obvious elements of refactoring toward modernization:

  1. Replace COMMON (with its INCLUDE file) with a MODULE and an entity therein,
  2. Module procedure SUB to enable an explicit interface and its USE (everywhere),
  3. Replace the use of EQUIVALENCE toward relabeling with the TARGET attribute on the data and objects of POINTER attribute toward aliasing,
  4. Miscellaneous other changes with implicit none; the use of named constants toward parameterization of the problem size as opposed to hard-wired sizes; avoidance of DATA statement, etc.

Is there a way you think that fpt users can drive it to arrive at resulting code like this?