Fortran `shlex` library

I’m happy to announce that I’ve released a Modern Fortran version of the shlex simple lexical analyzer, which is based on Python’s shlex library. I’ve implemented this because it’s needed to solve this issue with the fpm MPI metapackage, and there is no other way to address it unless we have a relatively full-featured command line parser for unix-like commands. (this is also what Meson uses for the same purpose).

So there we go, I hope it will be useful for that purpose and more.

fortran-shlex

Modern Fortran port of Python’s shlex shell-like lexer. This package implements the simple shlex lexer, inspired by the Python shlex module, and based on the Golang implementation. The interface comes with two functions, split which parses a command-like string and returns an array of allocatable character strings; and shlex that perform the same, but return a list of type(shlex_token) tokens. Error control is optional, via a boolean success keyword or a token that may return an error string.

Install

Just copy and paste shlex_module.f90 into your project. Alternatively, fortran-shlex can be used as a dependency in your Fortran Package Manager project:

[dependencies]
shlex = { git="https://github.com/perazz/fortran-shlex.git" }

Usage

The split function returns a list of strings, split according to unix shell rules, which support:

  • escaping quotes ("...")
  • non-escaping quotes ('...')
  • line feed, carriage return etc.
use shlex_module

character(len=:), allocatable :: tokens(:)
type(shlex_token) :: error
logical :: success

! Simple usage
tokens = split('my -W"ery" -Llong //Input \n "string"')

! With logical error flag
tokens = split('whatever ',success=success)

! With complete error flag
tokens = split('whatever ',error)
print *, 'error message=',error%string

And the shlex function has the same API, but returns a list of type(shlex_token)s instead of an allocatable character array.

use shlex_module

type(shlex_token), allocatable :: tokens(:)
type(shlex_token) :: error
logical :: success

! Simple usage
tokens = shlex('my -W"ery" -Llong //Input \n "string"')

! With logical error flag
tokens = shlex('whatever ',success=success)

! With complete error flag
tokens = shlex('whatever ',error)
print *, 'error message=',error%string

License

The source code in this repository is Licensed under MIT license (LICENSE-MIT or The MIT License – Open Source Initiative).

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

See also

12 Likes

Not that you necessarily should change this, but there is a new intrinsic subroutine with the name split.

Umh the reason is split is just consistent with the same function in the original shlex library. Is that going to be an issue down the road? As long as they’re both functions, the symbols should not conflict, right?

No real conflict. It just shadows the intrinsic, so you can’t use both in the same scope. Doubt that will be a concern, and there’s ways to work around it anyways.

1 Like

Is this always true?

In the F202X (now F2023) proposals, there were multiple versions of split:

SPLIT (STRING, SET, TOKENS [, SEPARATOR])
SPLIT (STRING, SET, FIRST, LAST)
SPLIT (STRING, SET, POS [, BACK])

which can be implemented in Fortran and exported under a generic interface:

  interface split
    module procedure :: split_tokens, split_first_last, split_pos
  end interface split

As long as the split routines in shlex have arguments which differ in type, kind, and rank, the specific routine needed can be resolved.

One caveat in this case is that the intrinsic split is a subroutine, whereas in fortran-shlex, split is a function. This is prohibited as demonstrated by this example:

module bar
private
public :: random_number
interface random_number
   module procedure random_pair
end interface
contains
   subroutine random_pair(a,b)
      real :: a, b
      call random_number(a)
      call random_number(b)
   end subroutine
end module

module foo
use bar
type :: pair
   real :: a, b
end type
interface random_number
   module procedure random_pair
end interface
contains
   function random_pair()
      type(pair) :: random_pair
      call random_number(random_pair%a,random_pair%b)
   end function
end module

which fails to compile with the following error:

/app/example.f90:16:4:

   16 | use bar
      |    1
Error: In generic interface 'random_number' at (1) procedures must be either all SUBROUTINEs or all FUNCTIONs
Compiler returned: 1

Addendum: the rule above does not apply for the case of intrinsics. I can modify the module bar as follows:

module foo
!use bar
type :: pair
   real :: a, b
end type
interface random_number
   module procedure random_pair
end interface
contains
   function random_pair()
      type(pair) :: random_pair
      intrinsic :: random_number
      call random_number(random_pair%a)
      call random_number(random_pair%b)
   end function
end module

But now, as @everythingfunctional says, the intrinsic subroutine is shadowed.

You can add to the “generic name” of an intrinsic, but with the same constraint that all specifics must be subroutines or all specifics must be functions. I.e.

generic :: int => char_to_int
contains
function char_to_int(...)
...
end function

Otherwise, the user defined specific name shadows the intrinsic generic name. The rules are a bit nuanced and confusing, but usually not truly problematic.

1 Like