Passing array expressions as argument

I’ve been hesitant to abstract some array operations with the fear that an unnecessary copy would be made when passing array expressions as argument. A sample case for the operation:

subroutine mid_point(n, x, x_mid)
    integer, intent(in) :: n
    real, intent(in) :: x(n)
    real, intent(out) :: x_mid(n - 1)

    integer :: i
    
    do i = 1, n - 1
        x_mid(i) = 0.5 * (x(i) + x(i + 1))
    end do
end subroutine

And for a potential call:

call mid_point(n, a + b/c, result)

To my understanding, if mid_point is not inlined, a + b/c is evaluated before the subroutine execution, requiring a transient array. Is that correct?

Edit: Fixed potential call listing.

2 Likes

Hi @lsmenicucci, welcome to the community.

Your potential call lacks the first argument, n, but otherwise I guess you’re right, I don’t see any way (other than inlining) to avoid creating temporary array to be passed to the subroutine with the expression result.

  1. [Fortran view] No. There is no Fortran requirement to store an actual argument, because such a thing cannot be named and consequently cannot ever be referred to ever again. That sequence of values exists for the duration of the CALL statement and consequently the compiler can lay down any code that achieves the same result.
  2. [Compiler view] But it probably won’t.
  3. Have you considered using intrinsic array transformations CSHIFT/EOSHIFT?

I’ve tried some experiments, and the result from gfortran-15 -fdump-tree-optimized (on MacM1) seems like the following (according to the .optimized file, though not very sure about the details of this file…):

  • If mid_point() is defined in a separate module from the call statement, then a temporary array is created and passed to mid_point() explicitly (even if I attach -O3 -flto etc);
  • If mid_point() and the routine calling it are defined in the same module, then the former is inlined with -O3 (so no explicit call for mid_point());
  • Similarly, if mid_point() is defined as an internal routine and the call statement is in the host scope, then the former is inlined again with -O3.

So at least for gfortran, defining small library routines in the same module or scope redundantly via include etc might help…? (if really necessary)

Apart from that, I am wondering if there are other languages that can send such a math expression directly to another routine without making a temporary array…? (possibly an expression template + “auto” argument can do that kind of thing…?)

2 Likes

Would something change if we used assumed-shape arrays instead of explicit-shape arrays? For example:

subroutine mid_point(x, x_mid)
    implicit none
    real, intent(in) :: x(:)
    real, allocatable, intent(out) :: x_mid(:)
    integer :: n,i
    
    n = size(x)
    allocate(x_min(n-1))
    do i = 1, n - 1
        x_mid(i) = 0.5 * (x(i) + x(i + 1))
    end do
end subroutine
1 Like

To clarify, the dummy is known to be non-pointer, non-allocatable (either by explicit interface or implicit interface) and so cannot be pointer associated with anything. One can go through the entire list of other possible associations and determine that none apply here. Therefore, the entire actual argument array is “ethereal” (“made up”) and does not need to be stored, and the entire calculation can be done one element at a time, or 16 elements at a time, and in any order (the compiler can see the loop body and determine that), and on whatever resources the compiler can summon. If the compiler can determine that a violation of Fortran conformance happens just AFTER the CALL, or that RESULT is never referenced, it can even avoid doing any work altogether.

Not as far as Fortran is concerned. One variant is as open to optimizations as the other, unless I missed something.

1 Like
1 Like

Thanks, I will check that page! (BTW I still do not understand why it is called “thunk”, though the above page says “The term originated as a whimsical irregular form of the verb think”…) It seems to be related to various other things (listed at the bottom of the above page), so I will check them also.

I’ve just tried changing all the test codes from explicit-shape to assumed-shape, but the results were essentially the same… (for Gfortran). Other compilers may show different behaviors, though :slight_smile:

1 Like

The term dates from the early 1960s - when folks were implementing the first ALGOL-60 compilers. Check out the Jan 1961 ACM paper referenced in the wikipedia page.

You might also want to check out:

1 Like

Yes, I am a bit more familiar with closures, and it seems common to pass expressions via closures (e.g. in C++?). But I was wondering if some languages might support passing & receiving a math expression directly as an argument (as in the OP’s code). I remember D or Chapel were able to do that, but trying a bit with Chapel right now, the code became an error (because of some lifetime errors…) The “range” in D might be closer to what I imagined but need to check again. RE functional programming languages, I’ve never tried them up to now, so it seems interesting to learn (for this kind of things).

A while ago, I was under the impression that Fortran supported closures, but it seems that was just a side-effect of gfortran’s internal procedure implementation.

I think in this situation, the input array expression argument would need to be evaluated and stored in a temp array and also the output array would need to be allocated. The former would probably be done by the caller, and the latter by the callee, so they would not typically be done together with a single allocation operation. Of course, if inlining occurs, then that would be a different situation.

I have seen languages where expressions could be used and passed as arguments. These are usually interpreted languages, such as Mathematica, which I think has this capability but I don’t remember the syntax offhand. Scripting languages, such as bash, have the eval() command to evaluate an expression with its current values (typically for scalar expressions, not arrays). PL1, a compiled language, is another possibility that might have had this feature.

2 Likes

I sometimes have trouble understanding what people mean by this. When I read a Fortran program, understand what it means, do the calculations with pencil and paper, and compose the output, am I “inlining”? When I recognize that a program is computing the millionth hexadecimal digit of pi, can’t I look it up and write the answer? My view is that whatever a lazy fool can do, a compiler can. The rules in the language are there to enable lazy shortcuts, without ambiguity.
Can you sum a billion numbers with one addition? Sure! If you are asked for sum(a(2:N:2)) and you have already calculated sum(a(1:N)) and sum(a(1:N:2)) for N=2,000,000,000, then just subtract these two numbers.
The notion of “inlining” seems to be a subset of “since we have proved that nobody’s legitimate interests are affected, we can take this shortcut”. Fortran does not specify a sequence of machine states, it specifies a computation.

I do too because it can mean different things in different contexts. However, when I used that term in my previous post, I meant that there is a caller and a callee, and typically the code for them would be seen by the compiler in two different steps, perhaps separated in time by days, months, or years. In a straightforward implementation, the caller would be required to allocate a temporary array in order to reduce the array expression argument to a temporary array, and the callee is required to allocate an array because the result has the allocatable attribute. If those two allocations are separated in a way where the compiler cannot “see” them together, then the programmer should expect them to be done separately, as two rather expensive operations. But if the compiler can see them both together, and effectively to treat the subprogram as a set of local operations in the callee, then the compiler could see the need for both allocations during the execution of the subprogram. In this case, there are several optimizations that might occur. One is that the two expensive allocations could be merged together into a single allocation with larger size (2*N-1 in this case, I think). That would reduce the allocation effort by almost half compared to the straightforward implementation of the code. Another optimization that might occur is for the compiler to recognize that the simple array operations performed by both the caller and the callee can all be combined into a single loop (a do concurrent type loop in this case) with no temporary array allocations required at all. That is why I said that if inlining occurs, then that would be a different situation. The concerns in the original post about the costs associated with the array allocations could be reduced (by half) or eliminated entirely. There are also other concerns that come into play. If these allocations are done by the compiler, and they are done on the stack rather than the heap (which I think is allowed for the sample code given in this case), then there is the possibility for long arrays that the array allocation could overflow the stack and result in a run time execution error. But if inlining occurs and the array allocations are eliminated as part of that optimization, then the possibility of that run time error is eliminated.

I somewhat agree with this sentiment, but I’m unsure how far a compiler is allowed to take this direction. A commonly discussed situation, which you refer to above, is whether a function is required to be executed when its return value is not needed. Two examples are function references within logical expressions such as lexp.and.f() and numerical expressions such as value*f(). If the compiler determines that lexpr==.false. or that value==0, then can the f() evaluations be skipped entirely? The standard is, in my opinion, unclear about this and it has been so since the f66 and f77 era. Partly as a defensive measure, I assume that such function calls can be skipped in these cases, but that they are not required to be skipped, but I know that other programmers make other assumptions based on particular compiler behaviors.

There are some situations where fortran does specify a sequence of machine states and not just the final computation result. One situation that comes to mind is the file position within an external file. It can be at the beginning of the file, within a record of the file, between records of the file, or after the last record and before the endfile mark. Transitions between those machine states depend on the sequence of rewind, read, write, and endfile statements in the code. Another example is when the programmer implements a finite state machine. In this case, the state of the machine (internal state and data) determines the next action to take, and the fortran standard must specify the machine state and the transitions between states in order for such programs to have well-defined results. I guess in this case one might say that the sequence of machine states is the computation.

And that is why the language specification does not use such terms and does not talk about object files, linking, separate compilation. All that matters is the computation. I find that most users, here and elsewhere, are curious about the compiler and its decisions and not so much the language, for understandable reasons. I still think it instructive to keep those separate in my mind, with the slogan “A Fortran compiler could use a mind-reader and a time-machine, should they become available.”

A standard-conforming compiler, all the way. A compiler that conforms to traditional user expectations, not so much.

10.1.7 Evaluation of operands 
It is not necessary for a processor to evaluate all of the operands of an expression, or to evaluate entirely 
each operand, if the value of the expression can be determined otherwise.

What would make it clearer?

Indeed, because I/O is when the cards are laid down and the entire “processor” (compiler, linker, libraries, o/s) is finally checked for fidelity to the specification. I/O is not abstracted like the CPU, RAM and network card are, it is thoroughly modelled.

1 Like