A collection of useful string methods (inspired by Python)

I find Python’s string handling capabilities very good. I missed those useful string methods in Fortran, so I decided to implement them. At the moment I have these functions:

public :: &          !# Python equivalents:
      capitalize, &     !# "anna".capitalize() -> "Anna"
      center, &         !# "*".center(3) -> " * "
      count_elems, &    !# "Anna".count("n")
      endswith, &       !# "01.png".endswith(".png") -> True
      equal_strings, &  !# same length .and. same content
      explode, &        !# list("abc") -> ["a", "b", "c"]
      find, &           !# "Anna".find("n") -> 1 (Python is 0-based)
      isascii, &        !# "Éva".isascii() -> False
      isdigit, &        !# "2026".isdigit() -> True
      is_in, &          !# "prog" in "programming" -> True
      islower, &        !# "anna".islower() -> True
      isspace, &        !# "   \t    \r\n" -> True
      isupper, &        !# "ANNA".isupper() -> True
      lower, &          !# "aNNa".lower() -> "anna"
      lstrip, &         !# " \t   anna  " -> "anna  "
      removeprefix, &   !# "01.jpg".removeprefix("01") -> ".jpg"
      removesuffix, &   !# "01.jpg".removesuffix(".jpg") -> "01"
      replace, &        !# "cat dog cat".replace("cat", "kitten") -> "kitten dog kitten"
      rev, &            !# "abcd"[::-1] -> "dcba"
      rfind, &          !# "Anna".rfind("n") -> 2 (Python is 0-based)
      rstrip, &         !# "  anna  \n" -> "  anna"
      slice, &          !# like in Python: s[1:5:2], or s[10:2:-2]
      split, &          !# "  aa  bb  cc  ".split() -> ["aa", "bb", "cc"]
      startswith, &     !# "01.png".endswith("01") -> True
      strip, &          !# "  \t    aa    \t   \n".strip() -> "aa"
      swapcase, &       !# "Anna".swapcase() -> "aNNA"
      upper, &          !# "Anna".upper() -> "ANNA"
      zfill             !# "7".zfill(3) -> "007

The source code is here: jstring.f90. Test cases are here: test_jstring.F90.

I would like to get some feedback. What else should be added? How could I improve it?

8 Likes

before trying to reinvent the wheel yet another time, did you check resources such as

?

3 Likes

That’s indeed very useful! To echo @hkvzjal, it may be worth looking at the state of similar work in stdlib and thinking about enhancing the functionality there with your project.

2 Likes

it may be worth looking at the state of similar work in stdlib and thinking about enhancing the functionality there with your project.

Agree. I think this is another example of someone stepping up to fill a need not currently (and for the most part will never be) part of the Standard. My only problem is its a perceived duplication of effort. It would be nice if fortran-lang would take the lead in identifying projects that offer duplicate capability across various libraries and provide a forum or place that the developers can cooperate to move the most important functions into stdlib. I’m not against multiple choices. Its just who has the time anymore to try them all.

2 Likes

Let’s not discourage new Fortran developers. They are so rare, they should be celebrated. I went through your repo, and I see very good things for someone who just started in Fortran. @jabuci, you went for a functional approach rather than oo, which is very fortranic so to speak. I also appreciate the usage a private/public which makes your module more explicit. And the cherry on top, you added unit tests.

Now this is true that a string module is probably what has been reinvented the most. To add to the previous list check:

You can probably find some more since most projects have a string module to some extent.

If you compare with stdlib you can probably find some material in you repo that are worth making a PR with. If you do so, I encourage you to test different implementation and go for the fastest as it has been done for to_lower/to_upper.

Keep the good work!

15 Likes

Thanks for the feedback. @davidpfister: thanks for your kind words.

Some explanation: I do this partially as an exercise to learn Fortran. On the other hand, since I come from Python, I’d like to have a library that mimics Python’s string methods as close as possible. So far, I’ve had a lot of fun writing these functions. And I find modern Fortran a very nice and readable language.

Thanks for the useful links, I’ll study them. I had found stdlib but it doesn’t contain too many functions: strip, chomp, starts_with, ends_with, slice, find, replace_all, padl, padr, count, zfill, join, to_string. I wanted more: capitalize, center, islower, lower, split, etc. It was the moment when I decided to make my own string library. My goal is to implement the most useful string methods from Python. Once the library stabilizes, I would be happy to try to contribute to stdlib.

5 Likes

A few editorial suggestions mostly regarding “advertising” you project …

Always add the topic keyword “fortran-package-manager” to github sites supporting fpm

On your github site you want to add the keyword

fortran-package-manager 

to your topics in your “About” section in the upper right region. When
you click on that you will see that your package was added to the
ring of other fpm-enabled github sites.

Fortran Wiki

You want to add your package to the Fortran Wiki page listing
libraries:
Libraries in Fortran Wiki

under the section labeled “String Manipulation”.

“fpm run --example ‘*’” currently fails

It looks like you pushed out an example ahead of committing a change?
At the moment, stdin is not defined in the jsys module but is used in
an example:

[  0%]                      jconv.f90
[  5%]                      jconv.f90  done.
[  5%]            example_jstring.f90
[ 11%]            example_jstring.f90  done.
[ 11%]               example_jsys.f90
<ERROR> Compilation failed for object " example_example_jsys.f90.o "
[ 16%]               example_jsys.f90  done.

example/example_jsys.f90:2:18:

    2 |    use jsys, only: stdin, stdout, stderr, argc, argv
      |                  1
Error: Symbol ‘stdin’ referenced at (1) not found in module ‘jsys’
compilation terminated due to -fmax-errors=1.
<ERROR> stopping due to failed compilation
STOP 1

Unicode support?

You might consider supporting UTF-8 encoded data or ISO-10646 encoding
of Unicode characters. In that case the new (they are still WIP also)
modules found at

might be useful examples. They are intended to mimic M_strings, taking into consideration some lessons learned.

Note that some procedures would barely require changing, but
that some become much more complicated. A to_upper can be done in a few lines just
supporting ASCII-7, but can require a table with thousands of entries if supporting
Unicode, for example.

CD/CI support?

You might want to explore using the github CD/CI functionality so that an
“fpm test” is automatically run when you change your repository.

Regex support?

Almost every string function can be written as a regular expression substitution, so you might
want to consider REGEX support. There are some example libraries mentioned in the Fortran Wiki and @belivaskys’ lists to get an idea of what the current state is.

User and Developer documentation

One advantage of being Python-compatible is that there is already documentation available for the procedures, but you might want to provide perhaps an ASCII text description of the routine functions in the test program.

Using an auto-document-generator such as ford or doxygen can be a useful learning experience that will likely be useful in your future Fortran-centric projects as well.

3 Likes

this is fantastic! very happy that you’re getting into Fortran and doing what we’ve all done at some point: try to bring feature X from language Y into Fortran. I suggest you try std::vector that’s a rabbit hole alright.

You can now try to contribute back to the stdlib. I also concur that you should incorporate a CI/CD pipeline, we have a very nice [conda driven one]( GitHub - gha3mi/setup-fortran-conda) and the one used by the stdlib they’re both very nice.

If you’re feeling adventurous I recommend watching this keynote by @rouson Fortran is all you need there’s lot of things there that can inspire you further!

1 Like

I thought that that forum already existed and that it was this one. Would you propose that another forum is created for stdlib?

You are right @davidpfister! @jabuci please accept my apologies for the harsh comment. My piece of advice: when you do an announcement, it would be nice if you include some acknowledgement of your own state of knowledge/research about what does already exist. We have all at some point rewritten some of those solutions because they are not provided by the intrinsics, and the fact that you are learning and sharing your experience is commendable. Acknowledging what came before you does really change how others read through, even if rewriting for learning, just because or whatever other reason.

This forum is very friendly and you will receive good help as you have seen so far.

Keep the good work!

5 Likes

The OP’s list is a bit more general, but for pattern matching and searching I’d recommend Fortran implementers compare their code to the C code in this github repository:

SMART: String Matching Algorithms Research Tool

@jabuci this is great. Yes, I think Fortran can be very similar and as easy to use as Python, including interactivity (LFortran), etc. If you want, work with the stdlib maintainers to get these additions there.

1 Like