Question about CONTAINS

Internal subprograms, those specified below a CONTAINS statement, are the modern replacement for statement functions (among other uses, of course). A module subprogram can have internal subprograms, but an internal subprogram cannot have its own internal subprograms. This restriction sometimes requires code to be refactored in order to fit within the limited nesting that is allowed.

Is there any effort to generalize this, e.g. in order to allow deeper or even unlimited nesting of CONTAINS levels?

Why were the current limits imposed in the first place? Are there syntax ambiguities otherwise? Is it difficult for compiler writers to handle the unlimited nesting case?

2 Likes

This thread seems to ask the same question:

Though not much discussions there, the final comment (by Peter Klausler) seems interesting:

Multiple levels of nested subprograms (when used) would complicate the implementation of access to escaping host-associated variables.

I don’t know the exact meaning of “escaping host-associated variables”, but it might be related to “capturing” those variables when sending internal procedures to other routines (like closures in other languages), for example…

module mod1
    implicit none
contains

subroutine mycall( sub )
    interface
        subroutine sub(); end
    end interface
    call sub()
end

end module

module mod2
    use mod1
    implicit none
    integer :: num = 0
contains

subroutine inc()
    integer, save :: foo = 10
    num = num + 1
    call mycall( mult )
    print *, "num = ", num, "foo = ", foo
contains
    subroutine mult()
        num = num * 2
        foo = foo * 10
    end
end

end module

program main
    use mod2
    implicit none

    call mycall( inc )
    call mycall( inc )
    call mycall( inc )
end

Result
 num =            2 foo =          100
 num =            6 foo =         1000
 num =           14 foo =        10000

I guess the standard people might have wanted to avoid possible implementation issues due to deeply nested routines (like those involving derived types and associate construct also? though not sure…)

1 Like

That is actually the use case that I would like to do in one of my programs. I want to pass a contained routine through the argument list to another subroutine. I want that internal subroutine to have access to host associated variables when it is eventually called. However, I could not actually do that because of the nesting limit, so I had to shuffle things around into otherwise unnatural positions to work around that language limitation. This was something that happened just yesterday, but of course I have run up against that nesting limit many other times over the last 30 years, more often when replacing f77 statement functions with modern fortran contained procedures.

This goes back to Fortran 8X days and the compromise/consensus that led to the Fortran 90 standard revision to state, “Internal procedures must not appear in other internal procedures.”

There have been proposals since Fortran 90, particularly toward Fortran 2000 (but which eventually became Fortran 2003) for “Nesting of internal procedures.” However it never gained traction, perhaps due to concerns from compiler implementations as suggested by that last comment at the J3 Fortran site for proposals.

I am also not convinced a good code design can need any nesting of internal procedures, particularly given all the facilities in the current standard.

Anyways a compelling, well-illustrated use case is called for to influence the compiler implementors on this, if the past history and above mentioned comments are any indication. Sorry but textual claims, oh “had to shuffle things around into otherwise unnatural positions to work around that language limitation,” likely won’t pass muster.

But the chances are any effort toward developing such a use case will bring out alternate approaches that can be clearer and/or more performant than nested internal procedures.

I really have a very hard-time believing the need for nested internal procedures during an effort to replace statement functions, particularly given all the limitations in the standard (what was then ANSI X3.9 1978 aka FORTRAN 77).

Surely one can conjure up some legacy code scenario where a reference to a statement function appears in a scalar-expr of another statement function and the chaining continues.

But in such a situation, one needn’t refactor “mechanically” with a suspension of all faculty to replace each of those statement functions with an internal procedure, rather a step back to look at the whole picture will reveal a better way to restructure the use of erstwhile statement functions code with modern facilities.

The situation I encounter is simpler. I move what used to be an external procedure to an internal procedure. Then I try to move a statement function in the original code into a new CONTAINS block, and the compiler tells me I can’t do that.

As for workarounds for this limitation, I’ve been doing that for 30 years now, so I can keep on doing it. It would seem to just be more natural to allow nested CONTAINS levels, so I thought I would ask about it here to get some opinions from compiler writers and from standards members such as you.

It will be useful to see an actual example of a situation you “have run up against that nesting limit many other times over the last 30 years, more often when replacing f77 statement functions with modern fortran contained procedures.”

Why not use a module procedure instead of an internal one? They may have contained procedures.

Using a module procedure instead of an internal procedure is one of several workarounds for the current language limitation. Others include moving the data into a module, and then accessing that, now shared, data by various internal procedures that are at the same level (not nested).

These all have various advantages and disadvantages. Moving data into a module makes it saved, whereas if it is in an internal procedure, it could be local. That means you need to take care that it is deallocated when it is supposed to be, whereas in the internal procedure that was handled automatically by the compiler. There may also be issues of thread safety that differ between shared module data and local data.

Moving the entire internal procedure into a module means that it no longer has access to entities by host association. So argument lists need to be longer to get all of the relevant data into and out of the module through the argument list. Or you can move data into the shared module, where, as above, it is now saved by default, so you need to worry about things like allocation and deallocation that would have been done automatically as an internal procedure. And of course, if the module is in a separate file, then that introduces another level of complexity with separate object files, guaranteeing the correct compilation order in the build process, and so on. All of that is avoided with internal procedures because everything is self-contained.

Of course, there are many cases where moving a routine from internal to a separate module is the right thing to do. But the programmer should decide that based on things like program clarity and ease of maintenance, he should not be forced into it due to an unnecessary language limitation…

Here is a simple example of the kinds of conversion steps that occur in practice with legacy codes. The first step might be to convert the legacy code to free form. So a code might look something like this:

subroutine sub(x)
   implicit none
   real, intent(in) :: x
   real, parameter :: a = 1.0, b = 2.0, c = 3.0
   real :: f
   f(x) = ((a * x + b) * x) + c
   ! ... code that makes multiple references to f(x)
   return
end subroutine sub

program nest
   implicit none
   real :: x
   read(*,*) x
   call sub(x)
end program nest

Then you see that statements functions are deprecated, so the next time you work on that code you eliminate them like this:

subroutine sub(x)
   implicit none
   real, intent(in) :: x
   ! ... code that makes multiple references to f(x)
   return
contains
   pure real function f(x)
      real, intent(in) :: x
      real, parameter :: a = 1.0, b = 2.0, c = 3.0
      f = ((a * x + b) * x) + c
      return
   end function f
end subroutine sub

program nest
   implicit none
   real :: x
   read(*,*) x
   call sub(x)
end program nest

Then you realize that sub() is only used in that one program, so it is better off being an internal procedure than an external or a module procedure, so you refactor it as:

program nest
   implicit none
   real :: x
   read(*,*) x
   call sub(x)
contains
   subroutine sub(x)
      implicit none
      real, intent(in) :: x
      ! ... code that makes multiple references to f(x)
      return
   contains
      pure real function f(x)
         real, intent(in) :: x
         real, parameter :: a = 1.0, b = 2.0, c = 3.0
         f = ((a * x + b) * x) + c
         return
      end function f
   end subroutine sub
end program nest

But now when you compile, the first error you see is

gfortran nest.f90
nest.f90:12:11:

   12 |    contains
      |           1
Error: CONTAINS statement at (1) is already in a contained program unit

which demonstrates the problem. There are several workarounds. I’m sure that everyone has encountered this language limitation many times in the last 30 years, just as I have, and they know them well. In this case, one might just move f(x) out to be its own internal function, at the same level as sub(). But imagine if the f(x) evaluation accessed also other entities with host association within sub(). In that case, it is a more complicated problem altogether.

I should point out that if you regress and replace the nested contains procedure with the original statement function (a deprecated feature), then everything compiles correctly and works as expected. This observation is, I guess, ironic.

My reason for posting the original question was to see if this is a language limitation with a reason, or if it is just arbitrary. If there is some semantic ambiguity with nested internal procedures, then I could understand that. Or if it is extremely difficult for compiler writers to handle nested internal procedures, then I could understand that. But so far in this discussion, neither of those seem to be an issue.

Does your program make multiple references to sub too? If it only makes one, why use a subroutine when you could just inline it?

Yes, that is another workaround, just inline everything. :wink:

First, such a situation is not one involving statement functions. Statement functions consumed in sub are in effect" at the same level as sub()" so refactoring with that in mind is what would make sense.

Secondly, “if the f(x) evaluation accessed also other entities with host association within sub()”, then you are right that “in that case, it is a more complicated problem” but the compiler vendors are in essence saying the complexity is on them whereas for the users, in that particular case, it shouldn’t be too onerous for them to author a separate module for the sub() and the f(x) and use the module instead of the internal procedure. Or there is another option in the standard also and the users can avail that.

Bottom-line: with the use cases like these, the compiler vendors are likely to see it as a low priority. But note also I am generally wrong on most matters and hold the tiniest of a minority view on things, so take what I convey here only as a call to work harder to put together a convincing case on this.

And then propose this at J3 Fortran proposals site and see how it goes. Perhaps Fortran 202Y will prove good for nested internal procedures whereas it didn’t for Fortran 2000 (later 2003)!

In C++ it appears possible to nest lambdas:

#include <iostream>

int main(int argc, char const *argv[])
{
    auto sub = [](float x){
        auto f = [](float x){ 
            const float a = 1, b = 2, c = 3;
            return ((a*x + b)*x + c);
        };
        // ... code that makes multiple references to f(x)
        std::cout << f(x) << '\n';
    };
    
    float x;
    std::cin >> x;
    sub(x);

    return 0;
}

CppInsights give you a view of the classes this generates under the hood. Taking a look with Godbolt, at optimization level -O2 everything has been inlined. I also tested wrapping everything in a third lambda.

Even if allowed, I’m not convinced this is a good idea in practice.

1 Like

I’m assuming you are talking about fortran here, and not c++ (about which we would all agree). What do you see as a downside of allowing nested internal procedures?

I was talking about both. (W.r.t. to C++, I’m not sure if we agree for the same reason.)

The biggest downside is that internal procedures are not testable. You can use the include statement to circumvent this limitation, but then the contained procedures live in a different file. At this point you might as well put them in a module so you can then import both in your application and in a separate test driver.

Since internal procedures are difficult to test, my personal opinion is their use should be limited to adaptors (adjusting callback interfaces) or for simple helper functions which arguably don’t need testing.

If you need to have N-levels of nesting (where N > 1), it implies your function is so complex, you need multiple levels of encapsulation to tame it. The difference compared to modules being that none of that contain-ed code is reusable. This implies there is limited abstraction or generality. Now perhaps that’s just the way it is, and you do have a very deep procedure. It has a simple interface, but tons of stuff going on internally in different layers. With so much going on, something is bound to change at some point. But since we are depending on actual functions, and not just their abstract interfaces, the implementation details are not hidden. A change in one of the nested procedures will require recompilation of the encompassing procedures all the way up to the top. This seems like a violation of the SOLID principles. (single-responsibility principle, dependency-inversion principle).

Since sub doesn’t use host association, I don’t see much advantage in making it an internal procedure besides not polluting the namespace with an extra module. Since you are building an application, and not a library, the modules are thrown away anyways. Is it a sin to have a module with one procedure? Maybe at some point you decide you need a new subroutine sub2. If sub were in a module, you’d already have a suitable place waiting for you. You might call this design for change.

Perhaps sub could be made more useful elsewhere by making it more general?

subroutine sub(f,x)
   interface
      pure real function f(x)
         real, intent(in) :: x
      end function
   end interface
   real, intent(in) :: x
end subroutine

Now you could reuse sub with different concrete functions.

Perhaps f(x) is reusable, and we can import it in sub():

subroutine sub(x)
   use precious_functions, only: f => precious_parabola
   implicit none
   real, intent(in) :: x
   ! ... multiple references of f(xx)
end subroutine

Maybe I’m looking at this wrong. What benefit would lifting the constraint on nested functions bring, that a hierarchy of modules and submodules can’t?

1 Like

Are you saying there is no reason to have contained procedures? I cannot immediately think of something I can do with a contained procedure functionally I could not do with modules, but it allows sharing scope of many variables easily, clearly indicates the function is not a general procedure but specifically for the surrounding procedure and would intuitively seem to let the compiler produce more optimizations, particularly inlining just for starters. I find it particularly nice for encapsulating specific printing functions and error handling. Creating and loading modules would not be nearly as efficient for that use.

And since .mod files are not portable I would hate to have to use modules for simple single-file applications or creating interface blocks, which contained procedures do not require.

I find it a very useful feature and have found it annoying when copying code from modules that I have to split it into multiple procedures and declare some of the variables in the top scope so both procedures can see them although there is no need for the top procedure to have them in scope, and so on; although I remember the first time I heard of them and tried to use them I was confused by the scoping rules and wished there were something like “IMPORT” can supply.

Would implementing it be easier if it was limited to procedures that themselves contained an “IMPORT NONE” statement? Since contained procedures can contain USE statements it seems that would not be the case, but some of the discussion appears to indicate otherwise.

No. Contained procedures are useful. It’s contained procedures within contained procedures that don’t convince me.

1 Like

In my little contrived example above, there is a case where it works if you leave the low-level function as a statement function, but it does not work if you convert that deprecated entity into a modern internal procedure. Isn’t that reason enough to allow multiple levels of internal procedures?

On a more practical matter related to the general usefulness of internal procedures, I had this situation in a code (about 15 years ago). It has four nested do loops

do p = 1, n
   do q = p, n
      do r = q, n
         do s = r, n
         enddo
      enddo
   enddo
enddo

There was repetitive looking stuff at the beginning and ending of each of those loops. The beginning loop code was pushing stuff onto stacks, and the ending loop code was popping it back off and processing it. This structure was perfectly clear for the inner-most loop because the two code sections were right next to each other, but for the outer loops, there was too much code in between the beginning and end for that structure to be apparent.

It then occurred to me that I could make an internal subroutine for each of the loops. The internal subroutine had host association access to all the variables, so there were no long argument lists necessary. The refactored version looks like

   call p_loop()
contains
   subroutine p_loop()
      do p = 1, n
         call q_loop(p)
      enddo
   end subroutine p_loop

   subroutine q_loop(p)
      do q = p, n
         call r_loop(q)
      enddo
   end subroutine q_loop

   subroutine r_loop(q)
      do r = q, n
         call s_loop(r)
      enddo
   end subroutine r_loop

   subroutine s_loop(r)
      do s = r, n
      enddo
   end subroutine s_loop

Each internal subroutine only has one loop, so the beginning loop and ending loop codes are all right next to each other within each one. Just that one little rearrangement, using internal procedures, made something very complicated look much simpler.

This did not require nesting of the *_loop internal routines (which is the original question in this discussion), it just shows the usefulness of the idea of internal procedures.

So the issue with a “contrived example” is indeed that it is “contrived”. In both the cases you contrive, “you might as well put them in a module so you can then import both in your application and in a separate test driver” in your refactored version, as advised above, particularly given the situation with “host association access to all the variables, so there were no long argument lists necessary”.

Other than satisfying a personal preference, very little or no benefit is apparent with having nested internal procedures in the language.

1 Like