Japanese Subgroup GENERIC proposal

So far the Japanese proposal only applies to procedures.

That said, I am interested in figuring out how to solve the “variant” use case. I think you’ll always need the “container” type, although it could possibly be made more convenient/ergonomic/transparent. What I think you’ll want to have without using inheritance though is a pattern matching mechanism on the other side. My initial thought is something in the direction of:

type, variant :: shape = circle, square
type(shape), allocatable :: my_shapes(:)

my_shapes = [shape :: circle(), square(2.0), circle(3.0)]
do i = 1, size(my_shapes)
  match type (s => my_shapes(i))
  type is (circle)
    print *, "Circle with radius = ", s%radius
  type is (square)
    print *, "Square with width = ", s%width
  end match ! Invalid for this to not be exhaustive
end do

Actually, we could have an intrinsic template to supply the first part. I.e.

instantiate variant(type(circle), type(square)), only: shape => variant_type

With an implementation (kind of) like

template variant(T1, T2)
  type, deferred :: T1, T2
  enumeration type :: variant_types
    enumerator :: alt_1, alt_2
  end enumeration type

  type :: variant
    type(variant_types) :: which_type
    type(T1) :: alt1
    type(T2) :: alt2
  end type
  interface variant
    procedure construct_from_t1
    procedure construct_from_t2
  end interface
contains
  function construct_from_t1(alt1) result(variant_)
    type(T1), intent(in) :: alt1
    type(variant) :: variant_

    variant_%which_type = alt_1
    variant_%alt1 = alt1
  end function

  function construct_from_t2(alt2) result(variant_)
    type(T2), intent(in) :: alt2
    type(variant) :: variant_

    variant_%which_type = alt_2
    variant_%alt2 = alt2
  end function
end template

Then the match type construct gets transformed like:

  select case (my_shapes(i)%which_type)
  case (alt_1)
    associate(s => my_shapes(i)%alt1)
      print *, "Circle with radius = " s%radius
    end associate
  case (alt_2)
    associate(s => my_shapes(i)%alt2)
      print *, "Square with width = ", s%width
    end associate
  end select

Of course if it’s intrinsic, the variant type can be packed much more efficiently, the variant template can be variadic (take however many type arguments as you want), and the transformation of the match type construct can be handled by the compiler based on the type of the selector.

If generics are to be used to create a variant type, it is important to think about ergonomics. One of Rust’s major selling points is the enum type (which is a built-in variant) and language support for pattern matching on enums.

I’ve heard many complaints about C++'s std::visit with regards to ergonomics and runtime/compiletime performance. Here is the rust version of the above C++ variant code:

enum Shape {
    Circle{radius : f32},
    Square{width: f32},
}

fn main() -> () {

    let my_shapes = vec![Shape::Circle{radius:1.0},
                         Shape::Square{width:2.0},
                         Shape::Circle{radius:3.0}];

    for s in my_shapes {
        match s {
            Shape::Circle{radius} => println!("Circle with radius {}", radius),
            Shape::Square{width} => println!("Square with width {}", width),
        }
    }
}

It is no joke when I say people LOVE enums and pattern matching in Rust. And it is well-optimized (even in debug mode!!), which I’ve heard is not the case for C++ compilers and std::visit currently. I haven’t investigated it too much.

Thankfully, Fortran already has type matching with the switch type. We should use that, and try to make it as ergonomic as we can. And it should really be exhaustive. It would be sad if it’s not. People love that about Rust too.

2 Likes

I’m somewhat hesitant of including all types inside a single variable declaration.

The statements like REAL(REAL32,REAL64,REAL128), RANK(0), INTENT(IN) :: X will need to be modified in future if new types are needed by the user.

This would create lot of git-diff noise, and require more lines to be changed.

While I understand that this makes it easier for the compiler to check for type correctness, this also makes the language more verbose, and more resistant to modification.

A more user friendly option maybe to define the types as a generic type, and then let the user instantiate the types inside the module.

In pseudocode …

generic subroutine printPair(x, y)
  type(T) :: x
  type(U) :: y
  print *, "Pair: ", x, y
end subroutine

instantiate printPair(x(REAL32,REAL64,REAL128), y(INT32, INT64))
instantiate printPair(x(REAL32(:)), y(INT32(:, :))

Since the instantiation would be inside the module, it would be easier for the compiler to do the necessary type safety checks.

I think this would make it easier for devs. They just need to instantiate a new generic subroutine by modifying a single line instead of laborious process of going through the whole code, and modifying types line by line.

This also has some limitations that I need to think about more, and would be happy to discuss in future.

PS: On a broader note, I somewhat disagree with Fortran having advanced generics like C++. If the language becomes too complex, it will effectively hurt the portability and performance optimizations for compiler developers. But I support simple generics that basically are a safer replacement of C macros and Fypp preprocessor that we currently need to use.

Please note the BCS meeting announced here yesterday will have a presentation on Generic Programming:

15.40 <--------------------------------- GMT

Generic Programming
John Reid

WG5 has accepted a proposal from the Japanese National Body for extending the generic capability that has been present in Fortran since Fortran 90. The programmer will be able to declare that a procedure is generic and has arguments that may have alternative types, kinds, or ranks. The compiler will generate those explicit versions that are invoked. It will solve a problem that my group has faced for more than 50 years. It began with having to write and maintain versions for both single and double precision and is worse now with more precisions of reals and several lengths of integers.

This proposal will complement a much more ambitious US proposal that has been developed by a J3 subgroup since the 2019 WG5 meeting in Japan.

This talk will explain the Japanese proposal and give a very brief summary of the aims of the US proposal.

Discussion…

16.30

Close

1 Like

Actually, the compiler generates the Cartesian product of all combinations of types, kinds and ranks specified in the generic procedure, whether any of them are actually used or not.

And if the statement is in an external dependency the user has no options. It may reduce some duplication, but does not produce code which is truly generic in the sense of being extensible. This is the main reason that the generics subgroup was hesitant to pursue this design.

Can something like this work, instead of the example shown in the screen shot by @aerosayan

template function has_nan{K,N}(x) result(ans) 
   use :: ieee_arithmetic
   integer, constant :: K, N
   real(K), rank(N), intent(in) :: x
   logical :: ans

   ans = (N == 0 ? ieee_is_nan(x): any(ieee_is_nan(x)))
end function has_nan

and use it like this:

use :: iso_fortran_env, only : wp => real128
real(wp) :: x
! do something
print *, has_nan{K:wp, N:1}(x) ! or has_nan{wp,1}(x)

and something like requires real_kinds(...) can be added to check compiler supported kinds, maybe.

2 Likes

This is certainly in the ballpark. The exact syntax may be slightly different, but the idea is that we want an abbreviated form for single procedure templates, and inline instantiation of single procedure templates.

1 Like

There’s obviously a lot of nuance regarding language design that I do not have experience in, but from a user experience perspective, I still see some limitations of declaring all of the types in the declaration of variables.

At first glance, their method seems useful for small subroutines and functions, but it becomes far more complicated for bigger subroutines and functions.

I simply wish to bring the limitations to everyone’s notice.

Here’s an example code showing partitioning algorithm for quick sort that is created following the method shown by the proposal …

  integer(int32) generic function QuickSortPartition(array, istart, iend) result(ipivot)
    implicit none

    !
    ! Arguments
    !

    integer(real32, int8, int16, real64, int32, int64,  real128), intent(inout) :: array(:)
    integer(int32), intent(in) :: istart, iend

    !
    ! Data
    !

    integer(int32) :: i, j
    integer(int8, int64, real32, real64,  int32, real128) :: pivotVal

    !
    ! Code
    !

    ! Set the pivot value to the last number in array
    pivotVal = array(iend)

    ! Swap elements until all elements left of pivot, are smaller than pivot,
    ! and all elements right of pivot, are bigger than pivot
    i = istart - 1
    do j = istart, iend - 1
      if (array(j) < pivotVal) then
        i = i + 1
        call Swap(array(i), array(j))
      end if
    end do
    call Swap(array(i + 1), array(iend))

    ! Set pivot index and return it
    ipivot = i + 1
  end function

There are few things that I don’t find to be user-friendly …

  • The generic variable declarations for array and pivotVal are extremely verbose: We would need to repeat this verbose variable declaration throughout our code base for most generic functions we need. Fortran is already very verbose, and although it is not a problem for experienced devs, it is a obstacle for new developers who are learning Fortran for the first time.
  • It is hard to spot errors: I intentionally didn’t give a variable type in either array or pivotVal. Please see how much time it requires to manually find it. Maybe the compiler can help with this, but still it would be manually hard to spot the missing variable type. For added complexity, I scrambled the variable types around, in array and pivotVal, because in real world code, it is expected that users will not follow a perfect coding standard, or will inherit code from someone else who didn’t follow any coding standards. So, typing all variable types in the variable declaration makes the code error prone, and hard to debug/fix.
  • It is hard to modify code: If we assume that the missing variable type in the previous statement would be found by the compiler, and to help the user, the compiler would throw an error. This would unfortunately also cause a new branch of unfriendly experience for the user. If the compiler throws errors for missing types, then it also means the compiler will throw errors when we try to add a new type to our generic algorithm. That is, if we want to support int128 for some reason in this algorithm, then we would need to add int128 to all of the generic variables first, or else the compiler would throw errors, and would not let us proceed.

In my humble opinion, to remove all of this unfriendly user experience, the best method would be to reduce the number of places the user has to modify to included new types to their generic algorithm.

Fypp does it better, and quoting code from stdlib, we can see how defining the types in a single place allows the rest of the generic code to be easier to understand, develop, and maintain.

I’m not sure what would be the best syntax to do this in future Fortran standards, but improving user experience should be a primary decision criteria.

#:set INT_TYPES_ALT_NAME = list(zip(INT_TYPES, INT_TYPES, INT_KINDS))
#:set REAL_TYPES_ALT_NAME = list(zip(REAL_TYPES, REAL_TYPES, REAL_KINDS))
#:set STRING_TYPES_ALT_NAME = list(zip(STRING_TYPES, STRING_TYPES, STRING_KINDS))
#:set CHAR_TYPES_ALT_NAME = list(zip(["character(len=*)"], ["character(len=len(array))"], ["char"]))
#:set BITSET_TYPES_ALT_NAME = list(zip(BITSET_TYPES, BITSET_TYPES, BITSET_KINDS))

...

#:for t1, t2, name1 in IRSCB_TYPES_ALT_NAME
        module subroutine ${name1}$_sort_index( array, index, work, iwork, &
            reverse )
!! Version: experimental
!!
!! `${name1}$_sort_index( array, index[, work, iwork, reverse] )` sorts
!! an input `ARRAY` of type `${t1}$`
!! using a hybrid sort based on the `"Rust" sort` algorithm found in `slice.rs`
!! and returns the sorted `ARRAY` and an array `INDEX` of indices in the
!! order that would sort the input `ARRAY` in the desired direction.
            ${t1}$, intent(inout)                    :: array(0:)
            integer(int_size), intent(out)           :: index(0:)
            ${t2}$, intent(out), optional            :: work(0:)
            integer(int_size), intent(out), optional :: iwork(0:)
            logical, intent(in), optional            :: reverse
        end subroutine ${name1}$_sort_index

#:endfor
1 Like

I thnk you meant pivotVal always to have the same type as array. This can be achieved with the declaration
typeof(array) :: pivotVal
or by including pivotVal in the declaration of array. The declaration you give actually breaks a constraint in N2217.

I think this change addresses most of things you found unfriendly.

2 Likes

The compiler might choose to generate all the versions, but it might also choose to generate just those invoked. This is surely in the realm of compiler dependency. I would see a compiler that generates them all as being inefficient.

I have looked carefully through the requirements paper 22-120r5 and I do not see the requirement of being extensible. Have I missed it?

Also missing from 22-120r5, as far as I can see, is the requirement to be able to form a generic procedure, that is, one with a single name that can be invoked with an argument of more than one type, kind, and rank. While it might not be a requirement, is it possible to do it with the current feature?

2 Likes

In fact, you must declare it the way @JohnReid indicates. To do otherwise would create the cartesian product of the types for array and pivotVal, i.e. there would be versions where array and pivotVal do not have the same type.

A TEMPLATE is a parameterized scoping unit

The Japanese design is not parameterized, i.e. there is a not a way for external code to supply different types/kinds/ranks/etc.

Yes.

template tmpl(...)
  ...
  private
  public :: generic_name
  interface generic_name
    procedure tmpl_proc
  end interface
contains
  subroutine tmpl_proc(...)
    ...
  end subroutine
end template

Granted it is more verbose, which is why an abbreviated form is desirable.

2 Likes

This gives me one procedure with both a generic and a specific name. I would like to have one procedure for more than one type/kind/rank. How do I do that?

1 Like
instantiate tmpl(integer, ...)
instantiate tmpl(real, ...)
instantiate tmpl(type(some_unforseen_type), ...)
... ! however many you want

Each instantiation adds a new specific procedure to the generic interface.

2 Likes

@JohnReid welcome! Thanks for joining the debate.

Can dummy arguments also be declared this way?

typeof(dummy1), intent(whatever) :: dummy2

I mention this because this capability is currently missing with class(*) type declarations of dummy arguments. The goal would be to require multiple dummy arguments to be of the same type+kind, not just of the same class, and the compiler could catch type mismatches of actual arguments at compile time, not just at run time.

Yes, that statement looks fine to me and OK in F2013. Can be useful in either of the generics proposals.

Yes, this is valid already. Note that it only takes the declared type of dummy, not the dynamic type. I.e.

subroutine s(x, y)
class(something_extensible) :: x
typeof(x) :: y
end subroutine

does not mean that y will always be the same type as x. Rather in this case y will always be type(something_extensible).

Yes, this is what is currently inadequate. What the programmer sometimes wants is to require that the two arguments are the same type, and he wants the compiler to enforce that requirement at compile time.

The programmer must now do something like the declaration you show, and then use a select type construct within the subroutine to enforce the requirement at run time. That should, of course, remain an option, because sometimes a run time test is all that is possible, but there should be also some kind of way for the compiler to enforce the requirement sooner, at compile time.

If I understand the previous replies correctly, in this generic context

really does enforce identical type+kind constraints on the arguments, whereas in the class() situation it only enforces the less specific class constraint.

This is something that cannot (in the general case) be checked at compile time. I.e.

subroutine foo(a, b)
  class(T) :: a, b
  call bar(a, b)
end subroutine

subroutine bar(a, b)
  class(T) :: a
  typeof(a) :: b
end subroutine

You do not know if a and b have the same dynamic type in the call to bar until run time.

This is just a restatement of what we are talking about. If the same-type-constraint could be enforced in the dummy argument declarations in subroutine foo(), then the compiler could enforce at compile time the same-type actual argument constraints in the call bar(a,b) statement.

I do agree that there are cases that can be tested only at run time, but this is something that could be done at compile time in many of the common programming situations, if only there were a way to enforce the same-type-constraint at the time of declaration.

Am I correct with the previous interpretation of the discussion comments that this same-type declaration does exist in the proposed generic case?