Forcing keyword argument (Hacky solutions welcome)

Consider the following API for a function called chomp, which removes trailing characters from a string:

use stdlib_strings, only : chomp
implicit none
! valid usage:
print'(a)', chomp("hello", set="lo")  ! "he"
print'(a)', chomp("hello", substring="lo")  ! "hel"
! invalid usage:
print'(a)', chomp("hello", "lo")  ! invalid with one positional argument
print'(a)', chomp("hello", "lo", "lo")  ! invalid with two positional arguments
print'(a)', chomp("hello", set="lo", substring="lo")  ! invalid with two keyword arguments
end

A set of character works like in the intrinsics scan and verify, while a substring like in the intrinsic index, instead of deciding for one of those for chomp, let’s have both. An interface like this would provide the above specified API, but would also allow the invalid usage.

interface chomp
  pure function chomp(string, set, substring) result(chomped_string)
    character(len=*), intent(in) :: string
    character(len=*), intent(in), optional :: set
    character(len=*), intent(in), optional :: substring
    character(len=:), allocatable :: chomped_string
  end function chomp
end interface chomp

With this setup we have to do runtime checking whether both optional arguments are present or not and make use of an error stop. Also the order of set and substring is arbitrary and would allow at least one of them as positional argument.

That’s as far as we can get with Fortran, or is it?


So here is the challenge: Can we design a number of function under the generic interface chomp such that the invalid usage always results in a compile time error?

Whether or not this would actually be useful in this context is another question. I’m interested if we are able by some mean to hack around this language limitation with nothing but standard Fortran. Consider it a programming puzzle.


For some context, here is the patch for the actual chomp implementation (without any hacks):

Not sure if this is really a hack, but what about introducing two derived types?

type :: set_type
character(len=:), allocatable :: set
end type
type :: substring_type
character(len=:), allocatable :: substring
end type

The keyword arguments can be replaced with two constructor functions set() and substr(). Calling the function would look like:

res = chomp("hello", set("lo")) ! "he"
res = chomp("hello", substring("lo") ! "hel"

The downside is you need to import three functions instead of one. The functions could also be replaced with operators .set. and .substring..

You “stole” :slight_smile: my solution:

module chomp_mod
implicit none
integer, parameter :: nlen = 1000
type, public :: set
   character (len=nlen) :: str
end type set
type, public :: substring
   character (len=nlen) :: str
end type substring
interface chomp
   module procedure :: chomp_set,chomp_substring
end interface chomp
contains
function chomp_set(text,xset) result(chomped_string)
character (len=*), intent(in) :: text
type(set)        , intent(in) :: xset
character(len=:) , allocatable :: chomped_string
chomped_string = trim(text) // "_chomp_set" ! replace by actual code
end function chomp_set
!
function chomp_substring(text,xsubstring) result(chomped_string)
character (len=*), intent(in)  :: text
type(substring)  , intent(in)  :: xsubstring
character(len=:) , allocatable :: chomped_string
chomped_string = trim(text) // "_chomp_substring" ! replace by actual code
end function chomp_substring
end module chomp_mod

program main
use chomp_mod
print*,trim(chomp("foo",set("bar")))
print*,trim(chomp("foo",substring("bar")))
end program main

gives output

 foo_chomp_set
 foo_chomp_substring
1 Like

Next thing I can think of is to make chomp a derived type:

module chomp_mod
  implicit none
  type, private :: chomp_type
  contains
    procedure :: set => dt_chomp_set
    procedure :: substring => dt_chomp_substring
  end type

  type(chomp_type), public :: chomp

contains

function dt_chomp_set(self,string,set) result(chomped_string)
  class(chomp_type), intent(in) :: self
  character(len=*), intent(in) :: string
  character(len=*), intent(in) :: set
  character(len=:), allocatable :: chomped_string
  chomped_string = trim(text) // "_chomp_set" ! replace by actual code
end function
function dt_chomp_substring(self,string,substring) result(chomped_string)
  class(chomp_type), intent(in) :: self
  character(len=*), intent(in) :: string
  character(len=*), intent(in) :: substring
  character(len=:), allocatable :: chomped_string
  chomped_string = trim(text) // "_chomp_substring" ! replace by actual code
end function
end module

The allowed function calls are then chomp%set() and chomp%substring().

Another idea would be to shift the keyword arguments into operators as follows:

module chomp_mod

    implicit none

  type, private :: chomp_type
    character(len=:), allocatable :: string
  end type

  interface operator(.set.)
    module procedure chomp_set
  end interface

  interface operator(.substring.)
    module procedure chomp_substring
  end interface
contains

  pure function chomp(string) result(this)
    character(len=*), intent(in) :: string
    type(chomp_type) :: this

    this%string = string
  end function

  pure function chomp_set(string,set) result(res)
    type(chomp_type), intent(in) :: string
    character(len=*), intent(in) :: set
    character(len=:), allocatable :: res
    res = trim(string%string) // set ! replace by actual code
  end function

  pure function chomp_substring(string,substring) result(res)
    type(chomp_type), intent(in) :: string
    character(len=*), intent(in) :: substring
    character(len=:), allocatable :: res
    res = trim(string%string) // substring ! replace by actual code
  end function

end module

program main

  use chomp_mod
  implicit none

  write(*,'(A)') chomp('hello') .set. 'lo'
  write(*,'(A)') chomp('hello') .substring. 'lo'

end program

For a multi-option function, you could allow chaining operators (each operator adds one field and returns a chomp_type instance) until you finally wrap everying in a doit() function which returns back a character string.

It doesn’t make sense to me why “one positional argument” case should be invalid.

Anyways, this all gets into personal style or a very small group consensus that immediately influences stdlib work when the all-too important discussion around design consistency and convenience around APIs can do with a formalized structure and work process but which is rather difficult and can be too time-consuming.

So, for example, I personally prefer a rank-1 CHARACTER dummy argument of default length of unity as the set in my library procedures. And I think this can avoid most of the issues mentioned in the original post. But those putting in the hard yards now toward stdlib may not prefer this.

   ..
   private
   ..
   generic, public :: chomp => chomp_set, chomp_substr
   ..
contains
   function chomp_set( string, set ) result( chomped_str )
      character(len=*), intent(in) :: string
      character(len=1), intent(in) :: set(:)
      character(len=:), allocatable :: chomped_str
      ..
   end function

   function chomp_substr( string, substr ) result( chomped_str )
      character(len=*), intent(in) :: string
      character(len=*), intent(in) :: substr
      character(len=:), allocatable :: chomped_str
      ..
end module
    ..
   ! valid usage:
   print'(a)', chomp( "hello", set=["l", "o"] )  ! "he"
   print'(a)', chomp( "hello", substring="lo" )  ! "hel"
   print'(a)', chomp( "hello", ["l", "o"] )      ! "he"
   print'(a)', chomp( "hello", "lo" )            ! "hel"
   ..
2 Likes

I learn F2003+ by following @FortranFan. This looks like a more concise way of writing, as I did,

interface chomp
   module procedure :: chomp_set,chomp_substr
end interface chomp

Other than working with a Fortran 95 compiler, are there any benefits of doing things the old way?

1 Like

+1 from me for this approach to chomp. I think it’s okay to be inconsistent with the intrinsics if the design is more meaningful.

1 Like

Don’t worry, I didn’t plan to introduce this API in stdlib. I’m just curious if we can create such an API at all with Fortran and just used chomp as example.


Thanks for the feedback, I implemented this API now in the patch, further comments and feedback are welcome over at the pull request.

The closest I was able to come up with is

! SPDX-Identifier: CC0-1.0
module strings_chomp
    implicit none
    private

    public :: chomp

    !> Remove trailing characters in set or substring from string.
    !> If no character set is provided trailing whitespace is removed.
    interface chomp
        module procedure :: chomp_set
        module procedure :: chomp_substring
    end interface chomp

    type :: force_kwargs_hack
    end type force_kwargs_hack

contains

    !> Remove trailing characters in set from string.
    pure function chomp_set(string, force_kwargs, set) result(chomped_string)
        character(len=*), intent(in) :: string
        type(force_kwargs_hack), optional, intent(in) :: force_kwargs
        character(len=*), intent(in) :: set
        character(len=:), allocatable :: chomped_string
        integer :: last

        last = verify(string, set, back=.true.)
        chomped_string = string(1:last)

    end function chomp_set

    !> Remove trailing substrings from string.
    pure function chomp_substring(string, force, kwargs, substring) result(chomped_string)
        character(len=*), intent(in) :: string
        type(force_kwargs_hack), optional, intent(in) :: force, kwargs
        character(len=*), intent(in) :: substring
        character(len=:), allocatable :: chomped_string
        integer :: last, nsub

        last = len(string)
        nsub = len(substring)
        if (nsub > 0) then
            do while(string(last-nsub+1:last) == substring)
                last = last - nsub
            end do
        end if
        chomped_string = string(1:last)

    end function chomp_substring

end module strings_chomp

program test_chomp
    use strings_chomp, only : chomp
    implicit none
    ! valid usage:
    print '(a)', chomp("hello", set="lo")  ! "he"
    print '(a)', chomp("hello", substring="lo")  ! "hel"
    ! invalid usage:
    print '(a)', chomp("hello", "lo")  ! invalid with one positional argument
    print '(a)', chomp("hello", "lo", "lo")  ! invalid with two positional arguments
    print '(a)', chomp("hello", set="lo", substring="lo")  ! invalid with two keyword arguments
end program test_chomp

It exploits some optional private derived types which probably can never be provided. Having a different number of such derived types as dummy arguments changes the function signature sufficiently to allow putting them in the same generic interface.

❯ gfortran invalid.f90
invalid.f90:61:15:

   61 |     print '(a)', chomp("hello", "lo")  ! invalid with one positional argument
      |                1
Error: There is no specific function for the generic ‘chomp’ at (1)
invalid.f90:62:15:

   62 |     print '(a)', chomp("hello", "lo", "lo")  ! invalid with two positional arguments
      |                1
Error: There is no specific function for the generic ‘chomp’ at (1)
invalid.f90:63:15:

   63 |     print '(a)', chomp("hello", set="lo", substring="lo")  ! invalid with two keyword arguments
      |                1
Error: There is no specific function for the generic ‘chomp’ at (1)
❯ gfortran test.f90 && ./a.out
he
hel
❯ ifort invalid.f90
invalid.f90(61): error #6284: There is no matching specific function for this generic function reference.   [CHOMP]
    print '(a)', chomp("hello", "lo")  ! invalid with one positional argument
-----------------^
invalid.f90(62): error #6284: There is no matching specific function for this generic function reference.   [CHOMP]
    print '(a)', chomp("hello", "lo", "lo")  ! invalid with two positional arguments
-----------------^
invalid.f90(63): error #6284: There is no matching specific function for this generic function reference.   [CHOMP]
    print '(a)', chomp("hello", set="lo", substring="lo")  ! invalid with two keyword arguments
-----------------^
compilation aborted for invalid.f90 (code 1)
❯ ifort test.f90 && ./a.out
he
hel
❯ nvfortran invalid.f90
NVFORTRAN-S-0155-Could not resolve generic procedure chomp (invalid.f90: 61)
NVFORTRAN-S-0155-Could not resolve generic procedure chomp (invalid.f90: 62)
NVFORTRAN-S-0155-Could not resolve generic procedure chomp (invalid.f90: 63)
  0 inform,   0 warnings,   3 severes, 0 fatal for test_chomp
❯ nvfortran test.f90 && ./a.out
/usr/bin/ld: warning: /opt/nvidia/Linux_x86_64/21.1/compilers/lib/nvhpc.ld contains output sections; did you forget -T?
he
hel
❯ nagfor invalid.f90 
NAG Fortran Compiler Release 7.0(Yurakucho) Build 7038
Warning: invalid.f90, line 31: Unused dummy variable FORCE_KWARGS
Warning: invalid.f90, line 50: Unused dummy variable FORCE
Warning: invalid.f90, line 50: Unused dummy variable KWARGS
Error: invalid.f90, line 61: No specific match for reference to generic CHOMP
Error: invalid.f90, line 62: No specific match for reference to generic CHOMP
Error: invalid.f90, line 63: No specific match for reference to generic CHOMP
[NAG Fortran Compiler error termination, 3 errors, 3 warnings]
❯ nagfor test.f90 && ./a.out
NAG Fortran Compiler Release 7.0(Yurakucho) Build 7038
Warning: test.f90, line 31: Unused dummy variable FORCE_KWARGS
Warning: test.f90, line 50: Unused dummy variable FORCE
Warning: test.f90, line 50: Unused dummy variable KWARGS
[NAG Fortran Compiler normal termination, 3 warnings]
he
hel

5 Likes

Nice (hacky) solution! I learned a new pattern here with ‘force_kwargs’. Alas, some downsides are unusual API documentation and incurring the unhelpful ‘no specific interface’ compiler message.

I often want to require keyword arguments and/or, like in this case, have mutually exclusive arguments; I wonder whether there would be support for a new argument attribute that requires the argument to be specified with a keyword. Then perhaps generics resolution could be extended to distinguish between otherwise identical (type,rank,kind) argument interfaces.

2 Likes

You can have a chomp function with three character arguments, string, char, and method. If it is anticipated that chomping a substring is more common than chomping a set of characters, method can be made OPTIONAL, and the user must pass method=“set” to get that functionality. The function interface would be

function chomp(string,char,method) result(chomped_string)
character (len=*), intent(in)            :: string,char
character (len=*), intent(in), optional  :: method
character(len=:) , allocatable           :: chomped_string
end function chomp

NIce, if hacky, solution, but couldn’t chomp_substring be implemented using
index(string, substring, back = .true. ? I admit, I have not studied this possibility in much detail, but it would eliminate the loop :slight_smile:

1 Like

I don’t think we will get around an interative implementation of chomp_substring, the expected behaviour for the substring chomping would be:

print '(a)', chomp("hellooooo", substring="oo")  ! "hello"
print '(a)', chomp("hellohellohello", substring="hello")  ! ""

No, I realised (much) later that you want to remove all trailing substrings - I blame it on a Pavlovian conditioning: I see manipulation of indices into strings and think “it must be possible to do this in a functional fashion”. Enter the silver plated bullet …