You are the programmer, so we should be asking you about the motivation.
One guess is that there might be cases where nx would be less than the actual array length.
Who is “the user”? The programmer (i.e. you) should always ensure that nx is .le. the size of the actual array argument.
The sizes do not need to match exactly. nx should be .le. the size of the actual array. The test nx == size(x) is meaningless with the explicit shape declaration – it will always return true because that is how explicit shape arrays work. With the assumed shape declaration of the dummy argument, then size(x) will be the size of the actual argument, which of course has nothing to do with nx except for what the calling program sets its value to be.
I am trying to get LFortran to compile a 3rd party code, which uses this. So I am trying to learn about how this feature works, as I have not encountered this before.
Thanks for your other comments. Based on what you wrote, is the above code equivalent to:
program test_module_dim
implicit none
integer :: nx
nx = 2
call f(nx, [1, 2])
nx = 3
call f(nx, [1, 2, 3])
contains
subroutine f(n, x)
integer, intent(in) :: n, x(n)
print *, size(x)
print *, x
end subroutine
end program
In other words, does it behave as if you passed the global variable explicitly as an argument?
Yes, it is an explicit shape array, regardless of where n (or nx) comes from.
But now that you say that it is legacy code, I can give some ideas about why it might have been done that way. F77 and earlier did not have runtime memory allocation. Everything was done statically. So a common practice was to dimension arrays some maximum value, and then just reference the first nx elements within the subroutines, where nx is determined somehow at runtime. Instead of passing the dimension value as an argument to every subroutine, it might be placed in a common block. Then each subroutine could access the value from that common block, and they would all work consistently on just the nx-element subset of all the arrays.
Presumably, your legacy code converted the common block to a module somewhere along the line, and the static arrays were replaced with automatic or allocatable arrays, so you end up with kind of a mishmash of coding conventions.
Awesome, thanks @RonShepard for the explanation. For some reason I thought the n must come from function arguments, but now I can see that it can come up from anywhere.
It looks like it is evaluated when the function is called and that is the size of the array for the duration of the function, and even if n changes later, that does not modify the array size.
I am still occasionally doing such things, i.e. placing some dimensions in a module instead of passing them repeatedly to all subroutines of the module, when the dimension is fixed for a whole run of the program. There is some init routine that among other things sets the dimensions.
I see – you load the dimension from an input file, so it can’t be represented by integer, paremeter, it must be just an integer, but it does not change during the run. Here is where I encountered this issue:
@certik, so for you and your LFortran team, given your “vantage point” of influencing Fortranners toward safer and more performant computations using Fortran which will involve concurrent and parallel computing, you can consider - as part of vision, to also
add support for OO in LFortran ASAP,
consider OO more for simple code refactorings that help with concurrent and parallel computing,
support non-extensible derived types in Fortran as an extension since the Fortran standard committee simply drags their feet on this,
with non-extensible derived types, the polymorphic descriptor need not be included in the passed object reference that LFortran may be able to take advantage of with some performance gain.
Thus for the case you show in the original post as a MWE for the SNAP geom code may be refactored as:
module calc_m
type, non_overridable :: calc_t
integer :: nx = 1 ! Component initialization, NO implied SAVE
contains
procedure :: sub
end type
contains
subroutine sub(this, x)
type(calc_t), intent(in) :: this
integer, intent(in) :: x(this%nx)
print *, size(x)
print *, x
end subroutine
end module
use calc_m, only : calc_t
type(calc_t) :: calc
call calc%f( [1,2] ) ! work with nx = 1
calc%nx = 2
call calc%f( [1,2] ) ! work as expected
end
that can really enable thread-safe computations unlike the read-write approach to some global data (your comments based on feedback from other readers here here re: nx being a MODULE entity).
The point is that if no threading or concurent access is needed at this level, the solution with module variables is valid and reasonnable. I would write things differently if threading/concurrent access was needed.
I’m curious, @certik, how from a compiler writer’s perspective
integer, intent(in) :: n, x(n)
is handled differently from
integer, intent(in) :: n, x(*)
I don’t use such explicit-shape or assumed-size passing anymore, but way back when I did, my assumption as a user was that to get bounds checking I needed to use the former (along with compiler flags), but otherwise there was no difference.
Hi @nncarlson, we are still iterating on the exact design. The latest iteration of our design that I like the most so far is not implemented yet, but it is described here: ASR: Logical and Physical Types · Issue #1601 · lfortran/lfortran · GitHub, as you can see, we only got the idea about 2 weeks ago. You can read the details there. Using this new design, the “logical type” of the two arrays above, as well as integer, intent(in) :: n, x(:) would be all equivalent. The “physical type” would contain at least two options: represent by pointer / represent by descriptor; possibly one more for x(*). The frontend makes some default choice, such as x(:) by descriptor and x(n) by pointer. But the optimizer can change it.
Regarding explicit-shape arrays, I started using them more lately. I find the declaration with the explicit dimensions much more readable. Here is a recent example where I started with assumed-shape, but switched to explicit-shape: fastGPT/gpt2.f90 at 4e70c6a0e9f60a1b3c94f18b32c34a6012de1e7b · certik/fastGPT · GitHub. It’s self-documenting what each index is doing, and what dimension each array is using model length parameters.
Exactly. Except that it is not enforced by the compiler. Essentially we are then at the level of Python / NumPy, where people (myself included) also routinely do this, and it’s hard to ensure it is exactly correct, since it is not enforced. It’s a missed opportunity for Fortran. I still like my proposal from a few years ago where your line above could be also optionally written as:
integer, dim :: n, m ! The exact syntax to be determined
real, intent(in) :: a(n, m)
Which is equivalent to just using (:) everywhere, it is called the same way, things are passed in with an array descriptor, etc. But in addition it is self-documenting and also the compiler checks that the array dimensions of all the arrays are compatible.
However, there is a problem: what if we send in arrays that are compatible, but different n_seq, n_seq_x and n_embd than the rest of the code, by a mistake? Then the compiler will not catch this error. However, what if we use the method in this thread? Then the above becomes:
module gpt2
integer :: n_seq, n_seq_x, n_embd
contains
function mha(x, attn_w, attn_b, proj_w, proj_b, n_head, &
use_kv_cache, kv_cache) &
result(y)
real(sp), intent(in) :: x(n_embd,n_seq_x), &
attn_w(3*n_embd,n_embd), attn_b(3*n_embd), &
proj_w(n_embd,n_embd), proj_b(n_embd)
real(sp), intent(inout) :: kv_cache(n_embd,n_seq,2)
...
end function
end module
This has the advantage that we are not sending the module parameters as arguments (which is annoying), that are read from an input file, but otherwise do not change for the duration of the program. Yet the compiler (in principle at least) can check that everything is conforming.
So I guess even in my own code I could use the technique described in this thread.
isn’t it just the legacy interface ? The caller just passes the addresses of the arrays, without any dimension/size information, so how the callee could check anything ?
With an explicit interface, the compiler can still check type, kind, and intent. The explicit shape declarations can still check bounds internally for the declared shape, and the assumed size declarations can still be checked internally for the declared rank. That is not as much as can be checked with other dummy argument declarations, but it isn’t nothing.