Concerns regarding limitations in the current generics proposal

I’ve just started testing the prototype implementation of generics in LFortran. First of all I want to say that I think it’s great that the generics working group is presenting their work openly so that others can test it and provide feedback early in the process!

Also note that I’m just starting to get familiar with the syntax so if anyone thinks I’ve missed something do tell! My goal is to contribute so that Fortran get a quality generics feature, not to critique the working group. I’ll happily change my mind on something if you can show that I’m worried without reason!

I believe I’m seeing some major limitations with the proposal which I fear will severely limit the usefulness of the feature if it goes through as currently suggested.

To demonstrate the shortcomings I will try to implement typical functions where generics are used in other programming languages on a result type similar to the one seen in Rust. The concepts are however not unique to Rust, but commonly seen in most other programming languages with generics support like Java, C++ and functional programming languages like Haskell. The type of functionality I will show here is also commonly used in list- and iterator-like types. See for example map and for_each in Java’s Stream and List.

Result<T, E> is a type that is typically returned from a function. It can either hold a value of type T (upon success) or an error of type E (upon failure). In pure functional programming languages the same concept is often referred to as an Either type. In practice an algebraic sum type is usually used to represent this. Fortran doesn’t have these, but the code below is a reasonable approximation:

module result_mod
    implicit none
    private
    public :: result_tmpl

    requirement R(T, E)
        type :: T; end type
        type :: E; end type
    end requirement

    template result_tmpl(T, E)
        requires R(T, E)

        type :: result_t
            type(T), allocatable :: value
            type(E), allocatable :: error
        contains
            procedure :: is_error
        end type

    contains
        logical pure function is_error(this)
            class(result_t), intent(in) :: this

            is_error = allocated(this%error)
        end function
    end template
end module

Here I’ve also added a convenience function that can help us check if the result was an error or not. I could also have added functions or assignment overloads to construct a successful and a failed result_t, but I’ve omitted this for brevity.

We should be able this like so (note that I’m currently getting an internal error from LFortran on this, but I believe the syntax is correct):

    pure function get_answer(chars) result(res)
        instantiate result_templ(real, integer), only: result_real_t => result_t
        character(len=*), intent(in) :: chars
        type(result_real_t) :: res

        if (chars == 'fourtytwo') then
            res%value = 42.0
        else
            res%error = -1
        end if
    end function

Here I’ve used an integer as an error flag. Ideally the error should be a type extending something like error_t in my error handling library, but that’s a completely separate discussion.

Now let’s try to implement the Rust Result and_then method. This method takes a function as an argument and if the result holds a value it applies this function to the value and returns it as a new result with the function return type as T. It is used for applying infallible functions to the value that the result type might hold. If the function that is going to be applied itself is fallible then map is used. Other programming languages have similar concepts, but might use other names.

Ignoring the generics for a now, we could write it in Fortran like this:

pure function and_then(this, func) result(res)
    abstract interface
        type(U) pure function func_i(value)
            type(T), intent(in) :: value
        end function
    end interface

    class(result_t), intent(in) :: this
    procedure(func_i) :: func
    type(result_U_E_t) :: res ! We'll ignore how to define this for a while

    if (this%is_value()) then
        res%value = func(this%value)
    else
        res%error = this%error
    end if
end function

Here we encounter what in my opinion is the first problem with the generics proposal:

Problem 1: Specification of generics (template) is a separate scope

To implement the function I’ve shown above inside a template we would have to let the template take another type, U:

module result_mod
    implicit none
    private
    public :: result_tmpl

    requirement R(T, E, U)
        type :: T; end type
        type :: E; end type
        type :: U; end type
    end requirement

    template result_tmpl(T, E)
        requires R(T, E)

        type :: result_t
            type(T), allocatable :: value
            type(E), allocatable :: error
        contains
            procedure :: is_error
            procedure :: and_then
        end type

    contains
        logical pure function is_error(this)
            class(result_t), intent(in) :: this

            is_error = allocated(this%error)
        end function


        pure function and_then(this, func) result(res)
            abstract interface
                type(U) pure function func_i(value)
                    type(T), intent(in) :: value
                end function
            end interface

            class(result_t), intent(in) :: this
            procedure(func_i) :: func
            type(result_U_E_t) :: res ! We'll ignore how to define this for a while

            if (this%is_value()) then
                res%value = func(this%value)
            else
                res%error = this%error
            end if
        end function
    end template
end module

This doesn’t work because U is not specific to the result_t type, it is only relevant once we use the and_then function. When we return a result_t from a function we simply do not know what transformation the caller might want to apply to it. For different invocations of the same function it might be desirable to use completely different types U which would not be possible with this design!

What we need in this situation is to have and_then take a generic type U which is independent of the type definition of result_t itself.

Moving on, some might have noticed that the return type of and_then above is somewhat vaguely defined. This brings on to my next problem:

Problem 2: Generics need to be instantiated explicitly

Since the return type of and_then itself is a result we need to instantiate the template for its return type. But will it even be possible to instantiate the same template inside itself? To make it work we would need something like this:

    template result_tmpl(T, E, U)
        requires R(T, E)

        type :: result_t
            type(T), allocatable :: value
            type(E), allocatable :: error
        contains
            procedure :: and_then
        end type

    contains

        pure function and_then(this, func) result(res)
            instantiate result_templ(U, E, X), only: result_U_E_t => result_t
            abstract interface
                type(U) pure function func_i(value)
                    type(T), intent(in) :: value
                end function
            end interface

            class(result_t), intent(in) :: this
            procedure(func_i) :: func
            type(result_U_E_t) :: res ! We'll ignore how to define this for a while

            if (this%is_value()) then
                res%value = func(this%value)
            else
                res%error = this%error
            end if
        end function
    end template

Notice the X in the instantiation statement here. I really don’t know what to put here because it is the return type of and_then for the new result. There’s no way I can determine what that is a this point.

There are other problems with requiring explicit generics instantiation like this as well:

  1. It becomes extremely verbose: We probably want to instantiate a result_templ for each function we make which will be a lot. We will probably end up instantiating a template for the same T and E multiple times throughout a codebase, but might give them different names. This is very bad for code readability.
  2. Type inference becomes impossible: Since Fortran is a statically typed language it will be possible for the compiler to infer the actual types of generic arguments in many situations. This saves a lot of time and makes code much more concise. This is an example of one of the things that makes Rust stand out from C++. In C++ you either have to make your templates look incredibly complex or the caller must specify all template arguments. Rust on the other hand is much better at inferring generic types based on function arguments which makes it way more pleasant to use generics compared to C++.

A possible solution?

I know the generics working group has been thinking way more about this than me, but I wonder if the solution could be quite simple:

  1. Assign generic type parameters to existing scopes like function, subroutine, interface and type (and maybe module?). Let type bound procedures inherit generic types from their parent.
  2. Move the requirement block immediately after the function/subroutine/interface/type declaration. Separate named requirement blocks like the current proposal could also be allowed.
  3. Use angle brackets (< and >) to express generic type parameters right after the name.
  4. Instantiate generic code based on actual arguments. Where generic type parameters cannot be inferred, use angle brackets to specify actual types.

This is pretty much how Java, C++ and Rust does it and I cannot see why it should not work for Fortran. I fear that much of the problems I observe stems from a need to make everything look Fortran-like and unique. If that is the case I would say that it is only natural that languages - spoken languages as well as programming languages - take influence from each other and that is good! Rust inherited a lot of C++ syntax instead of reinventing the wheel. I could also go on about how English historically has been influenced by Old Norse which Norwegian (my first language) is a descendant of and how Norwegian again has been influenced by English in modern times making the circle complete. However, I digress…

With the ideas stated above I believe my result_t example could be expressed like this:

module result_mod
    implicit none
    private
    public :: result_t

    type :: result_t<T, E>
        requirement
            type :: T; end type
            type :: E; end type
        end requirement

        type(T), allocatable :: value
        type(E), allocatable :: error
    contains
        procedure :: is_error
        procedure :: and_then
    end type

contains


    logical pure function is_error(this)
        class(result_t<T, E>), intent(in) :: this

        is_error = allocated(this%error)
    end function


    pure function and_then<U>(this, func) result(res)
        requirement
            type :: U; end type
        end requirement

        abstract interface
            type(U) pure function func_i(value)
                type(T), intent(in) :: value
            end function
        end interface

        class(result_t), intent(in) :: this
        procedure(func_i) :: func
        type(result_t<U, E>) :: res

        if (this%is_value()) then
            res%value = func(this%value)
        else
            res%error = this%error
        end if
    end function
end module

And in use:

    pure function get_answer(chars) result(res)
        character(len=*), intent(in) :: chars
        type(result_t<real, integer>) :: res

        if (chars == 'fourtytwo') then
            res%value = 42.0
        else
            res%error = -1
        end if
    end function
6 Likes

Thanks @plevold for this excellent feedback!

Yes, I am not surprised LFortran failed, it’s still in alpha. But we are fixing things quickly.

I need more time to think about your code. The code at the end seems doable, at least by quick reading I don’t see any showstoppers with the current generics design. Most (if not all) the changes seem surface language level syntax and semantics. The core feature of the generics is indeed modeled by Rust and Haskell (and Go), all of which use “strong concepts”. Be careful with C++, since it uses “weak concepts”. If you want to know the difference, you can see all the details in our comparison document: generics/comparison.md at afcfa01c5e87a00d113036dc8ecb25fda5a91f24 · j3-fortran/generics · GitHub.

Why don’t you work with @everythingfunctional to figure out all the details. And we can have it as an example in LFortran as well.

2 Likes

Glad that my thoughts and experiments can be of help, @certik!

Yes! In my rather lengthy post I somehow forgot to say that the foundation of strong concepts is absolutely the right choice. My comparison to C++ was mostly in terms of syntax.

Another advantage splitting up the template construct is that it makes it possible to apply further restrictions to a type bound procedure than to the type itself. For example we could add a subroutine add_one to my result_t above that requires T to be integer. Silly example of course, but this technique is used extensively in Rust.

1 Like

This is reminiscent of one of the examples I was working with. I haven’t revisited it in a little bit, but I plan to soon. generics/fallible.f90 at main · j3-fortran/generics · GitHub

1 Like

Nice to see a similar example, @everythingfunctional. We’ve even used the same function name!

Your example works around the problem of having to specify a separatetemplate scope (what I referred to as “problem 1” in the original post) by using two of them. However this makes it impossible to make the generic and_then function a type bound procedure.

Further I would like to point out that these kind of operations are typically chained together where multiple and_then or other similar functions are applied to different generic types. Due to the need to explicitly instantiate a template (“problem 2”) there will be an ever growing list of statements like

instantiate and_then_2_tmpl(error_t, integer, string_t, thing_t, combine_errors)

before the actual code to be executed. Over time one would easily get in a situation where some instantiations might not be used due to refactoring, but it’s very difficult to tell just from reading the code.

One might argue that these kind of constructs should not be the core goal of Fortran. I disagree because this way of programming very efficiently safeguards against use of uninitialized memory (using a value when an error occurred) and guards against index out of bounds errors (when used on lists/iterator-like types). I also think it demonstrates well the need for complexity in a generics language feature with very simple examples.

Happy you liked the example. This is the kind of use case I’m hoping to enable as well.

I think you actually don’t want it to be a type-bound procedure, because then you can’t chain them together in a single statement. I.e.

maybe = foo(arg1, arg2)%and_then(bar, arg3)

is not valid (you can’t do object component selection on function results, only on variables), and so actually does need to be written as

maybe = and_then(foo(arg1, arg2), bar, arg3)

I actually agree with this sentiment, and we will explore ways to reduce the verbosity. Our approach to designing templates was to make everything explicit to be sure we enabled all the complex use cases. We will then explore situations where we believe the complete, explicit syntax could be inferred, and specify the rules for how compilers must perform the inference. My hope is that it’s possible and makes sense. It’s on our radar, we just haven’t gotten there yet.

3 Likes

@plevold if you can, please join the Generics subgroup and help them figure out all these issues. The more people working on this, the better.

1 Like

As a Fortran user these are the things that I would like the language to do as far as generics.

  • Generic variable size collections List<T> and dictionaries
  • Iterators with slices, filtering and zipping together
  • User defined types that can have array and collection semantics.

The last one is a big one. For example if you want to design a generic jagged array (array of arrays), or implement a geometric algebra library for dual quaternions (8 element array) with custom algebra operators (addition, product, wedge, …)

I think generics should complement the main purpose of Fortran which is to help implement mathematical formulas and structures in code. Making Fortran a clone of Haskell would be a big mistake IMHO. Peppering the code with obscure language constructs is going to make Fortran less readable.

I want to define my own generic structure and write code that uses existing functions like sum() or dot_product() but customize the behavior for my needs. Currently there is no way to write a function that accepts a slice as an argument. For example a function that is a called like obj % f(1:10). If I want my user types to have array semantics there must be a way to be able to do this. This request is part of the iterators definition for generics.

Fortran is great at writing clean code when dealing with arrays (more than any other language) and generics should extend this advantage to user written structures.

PS. All the required end constructs reminds me of Pascal without the begin keywords. Multiple lines like Type :: U; end type seem superfluous to me, when the declaration that U is a generic type could be done in far more succinct way.

5 Likes

These kinds of semantics are kind of “up next” for us. My current task is to start coming up with motivating use cases for what it would look like and mean for user defined “collections” to be usable with elemental procedures. We think this will be directly related to how we do iterators as well.

2 Likes

If I want to write an ODE solver that works with a user type containing the model information, I expect a requirement that this user type provides things like initial condition state vector and the function for the derivative of the state vector.

Now I want the state vector to have array semantics (so I can do linear algebra with it) which would be used by the ODE solver. Designing a state vector and implementing the model would be the job of the user of this library, which would allow them to focus on modeling tasks.

Currently to do this you would use arrays of fixed numeric type and pointers to a function with specific arguments that cannot accommodate the user-type describing the system model.

1 Like

I think chaining functions is crucial for readability when working with these kind of constructs. Obviously this is not possible in Fortran at the moment, but I hope it will one day. We also need proper lambda functions before this is going to be a preferable way of writing code.

That is however completely besides my point which was that the template scope makes this too limited. I could probably come up with many more examples where it would be desirable to have a type bound procedure be generic over a type U which is independent of the type declaration itself.

Great! I hope you explore solutions to completely eliminate the instantiate statements and rather make the instantiation inline with the concrete type declaration or procedure invocation. I think that is realistically the only way to enable type inference which I believe is crucial to make generic code user friendly. I also think that it’s way easier to verbally describe code when it’s done like this. For example type(result_t<integer, error_t>) :: res can be referred to as “result_t for integer and error_t” whereas I’m not sure how I would refer to type(result_int_err_t) :: res. My best suggestion is “the result_t type which I instantiated on line ## and named result_int_err_t because it was instantiated for integer and error_t”.

@certik that sounds very interesting, but I have no idea what it involves. Maybe you could explain some more?

1 Like

We meet once a week, Mondays at 1 US Eastern Time, to work on papers (proposals) for the standard. We are nearly finished with the syntax, and will subsequently be working on uses cases for additional rank-agnostic features, generic iteration, and application of elemental procedures to supported derived-types (i.e. containers). The goal is to be able to support all possible use cases, even if the simple cases still end up needing to be quite verbose. We will then explore ways of abbreviating certain cases. All are welcome to join us and help with this work. Just reach out directly if you’d like an invite.

1 Like

Great initiative! Timezone differences make that unfeasible for me unfortunately. Are the papers you mention those in the generics repo at the J3 github?

If I understand you correctly, this means that you intend to proceed with the template scope and the instantiate statement?

The papers in that repository are not the final versions voted on by J3, but they’re not far from it. I’m not sure the general public has access to the papers, but you may be able to access them here: J3 Fortran - J3 Meetings

Yes. There are use cases we’d like to support that we were unable to convince ourselves were possible to accomplish with the simpler syntax. But we absolutely do intend to explore possibilities for abbreviated syntax in certain cases.

Ok. I think this is a huge mistake.

It’s good to go forward with some preliminary syntax, so that we have something to implement in a compiler and something concrete to play with.

We absolutely are not done with the syntax, or even semantics or scope. We only achieved the first step. Now we need to iterate. @plevold why don’t you write down some examples how things could work. @everythingfunctional why don’t you meet with @plevold in a better time slot, so that he can be involved?

I can try and accommodate for some interactive discussions. I find we tend to get the most insights out of worked out example use cases, so for the most productive discussions we should decide what examples to discuss and have them prepared before meeting. I could probably make time to meet Monday or Tuesday week after next (Nov 21 or 22) for a meeting. Just let me know.

2 Likes

Honestly I just think that generics should work more like in most other language I can think of that has this feature (Java, C++, Kotlin, Swift, Rust, Go). That is:

  • Generics are created by adding type parameters to existing scopes like type and function
  • Generics are instantiated by providing actual types to the type parameters

Also note that of all the languages mentioned above the only one that mentions “template” at all is C++. The scoping rules are still the same as for the other languages.

This is a solution which, contrary to the currently proposed syntax, enables what I believe is three essential requirements:

  1. A type T generic over a type U should be able to have a type bound procedure that is generic over a type E which is independent of the type declaration
  2. A type T generic over a type U should be able to have a type bound procedure that takes or returns a T that is generic over another type E
  3. If type parameters can be used inferred from actual arguments it should not be necessary to give them explicitly

I believe items 1 and 2 are clearly demonstrated by the example I show in the original post. I’ll explore item 3 using the not-invented-here syntax in some very basic examples below:

Instantiation with no type inference

Example:

! Declaration of generic subroutine 
subroutine print_scalar_sizeof<T>() 
    requires
        type, deferred :: T
    end requires

    type(T) :: tmp

    print *, 'size is', sizeof(tmp) 
end subroutine 

This generic subroutine has no arguments so obviously type inference is impossible. It has to be instantiated with explicit type arguments:

call print_scalar_sizeof<integer>() 
call print_scalar_sizeof<real>() 

Type inference for generic subroutine

Example:

! Declaration of generic subroutine 
subroutine print_sizeof_array<T>(arr) 
    requires
        type, deferred :: T
    end requires

    type(T), intent(in) :: arr(:) 

    print *, 'size is', sizeof(arr) 
end subroutine 

Here, the type can easily be inferred by the compiler, no explicit type argument is required:

integer, allocatable :: arr(:) 
arr = [1, 2, 3]
call print_size(arr)

Partial type inference

Example (same as the one from the LFortran announcement thread):

module add_generic_mod
    implicit none
    private
    public :: add_generic

    requirement R(T, F) 
        type :: T; end type
        function F(x, y) result(z)
            type(T), intent(in) :: x, y
            type(T) :: z
        end function
    end requirement

    function add_generic<T, F>(x, y) result(z)
        requires R(T, F)
        type(T), intent(in) :: x, y
        type(T) :: z
        z = F(x, y)
    end function

contains
    
    real function func_arg_real(x, y) result(z)
        real, intent(in) :: x, y
        z = x + y
    end function 

    integer function func_arg_int(x, y) result(z)
        integer, intent(in) :: x, y
        z = x + y
    end function

    subroutine test_template()
        real :: x, y
        integer :: a, b
        x = 5.1
        y = 7.2
        print *, "The result is ", add_generic<:, func_arg_int>(x, y)
        if (abs(add_real(x, y) - 12.3) > 1e-5) error stop

        a = 5
        b = 9
        print*, "The result is ", add_generic<:, func_arg_real(a, b)
        if (add_integer(a, b) /= 14) error stop
    end subroutine
end module

Here, only the first type argument T can possibly be inferred. One could still allow the compiler to infer this while F is provided. For example, Rust uses _ for this purpose. I used : in the example above because that is possibly(?) more Fortranesque.

Explicit instantiation statements

Explicit instantiation could be useful in some situations, but I think aliases (using in C++, type in Rust, type in Go) would be more useful because it could be used to define a new generic:

type :: generic_t<T, U>
    !... 
end type 

! "alias" as a keyword is chosen without very much at random here
alias still_generic_t<U> = generic_t<integer, U> 
alias concrete_t = generic_t<integer, real>

These are just examples though. They important part is that I strongly believe that the three previously mentioned statements should be added to the list of requirements. If you believe you have a superior syntax to what I’m sketching out here, by all means go ahead with that.

If you want to discuss further I’m happy to join you in an online meeting. 21. nov, at 14:30 US eastern time (that’s UTC -5, right?) could work for me, how is that for you?

3 Likes

Thanks @plevold, very good progress. Let’s summarize the differences.

1

So instead of:

instantiate add_t(real, func_arg_real), only: add_real => add_generic
print *, add_real(x, y)

you propose:

print *, add_generic<:, func_arg_real>(x, y)

2

Instead of:

    template add_t(T, F)
        requires R(T, F)
        private
        public :: add_generic
    contains
        function add_generic(x, y) result(z)
            type(T), intent(in) :: x, y
            type(T) :: z
            z = F(x, y)
        end function
    end template

you propose

    function add_generic<T, F>(x, y) result(z)
        requires R(T, F)
        type(T), intent(in) :: x, y
        type(T) :: z
        z = F(x, y)
    end function

Besides these two syntactic changes, are there any semantic changes?

1 Like