Fortran Monthly Call: July 2020

It’s approaching time for our next monthly call, which will be in the week of July 13-17; please see the following doodle poll to mark your availability:

The final time slot will be selected and announced on the morning (European time) of Monday July 13, please complete the poll before then.

If there are specific issues you would like to discuss, please reply to this message.
As with the June call, we will record this call and make it available online for those who are unable to attend.

All the best,

Laurence

1 Like

Thank you Laurence. I suggest two items to be included in the discussion:

  • Code organization / structure of stdlib, background in this thread.
  • How can we move forward with strings in stdlib, needed for the development of other things like file system utilities and fpm. Originally suggested by @MarDie.
1 Like

I like the string suggestion. In fact most of the string handling routines necessary already exists but are scattered throughout several project (likely with incompatible licenses).

I think in the string issue in stdlib we agreed to have both functions/subroutines which operate on the intrinsic fixed-length character strings, and a string type, (perhaps) based upon the iso_varying_string proposal.

It would be great to also learn the reasons why common things like integer to string conversions and vice-versa have not been standardized or provided as an extension module.

3 Likes

(!) IMPORTANT - CHANGE OF SCHEDULE - Please update your calendars:

Further to my previous post, our next monthly call will be Thursday, July 16 Friday, July 17 at 7pm BST.

11:00 - 12:00 PT (California)
14:00 - 15:00 EDT (New York)
19:00 - 20:00 BST (London)
20:00 - 21:00 CEST (Central Europe)

As Milan has done previously, I will put a reminder one hour before the meeting starts.
Please continue to use this thread for topics you would like to discuss at the meeting.

All the best,

Laurence


Laurence Kedward is inviting you to a scheduled Zoom meeting.

Topic: Fortran Monthly Call: July 2020
Time: Jul 16, 2020 07:00 PM London

Join Zoom Meeting

Meeting ID: 936 7756 6475
One tap mobile
+442080806591,93677566475# United Kingdom
+442080806592,93677566475# United Kingdom

Dial by your location
+44 208 080 6591 United Kingdom
+44 208 080 6592 United Kingdom
+44 330 088 5830 United Kingdom
+44 131 460 1196 United Kingdom
+44 203 481 5237 United Kingdom
+44 203 481 5240 United Kingdom
+44 203 901 7895 United Kingdom
Meeting ID: 936 7756 6475
Find your local number: https://zoom.us/u/adidSBZBSb

Join by SIP
93677566475@zoomcrc.com

Join by H.323
162.255.37.11 (US West)
162.255.36.11 (US East)
115.114.131.7 (India Mumbai)
115.114.115.7 (India Hyderabad)
213.19.144.110 (EMEA)
103.122.166.55 (Australia)
209.9.211.110 (Hong Kong SAR)
64.211.144.160 (Brazil)
69.174.57.160 (Canada)
207.226.132.110 (Japan)
Meeting ID: 936 7756 6475

Some fuel for discussing strings:

Perhaps build a poll asking the importance of different classes of string routines?

Some major questions are the importance of

  • OOP versus procedural interfaces or both
  • ANSI character set versus Unicode
  • ISO_VARYING_STRING or other extended types versus the intrinsic CHARACTER type

I find having basic string functions for splitting strings on delimiters, case conversion, and
converting between numeric and string functions and handling white-space the most essential.

Some example stand-alone modules that are licensed as public domain that could fuel discussion are

  • M_msg # general routine for scalar values to string
  • M_strings # basic string functions in a procedure and OOP style
  • M_calculator # expression parsing
  • M_change # basic regular expressions

Related procedures that have dependencies are in

GPF (General Purpose Fortran)

  * M_path      - OOP interface for a GNU/Linux or Unix (ie. "Posix") pathname
  * M_bre       - Basic Regular Expressions
  * M_regex     - Fortran interface to POSIX 1003.2 regular expression library using ISO_C_BINDING.

  To a lesser extent (need better examples here):
  
     * M_list      - maintain simple lists

MISCELLANEOUS

Is parsing command line arguments part of this discussion? It is largely the act of parsing a “command” string?

RELATED MAN PAGES

The Stand-alone modules referenced above have their own documention, but I think the following
routines as described in GPF manpages
are pertinent and also cover almost all the stand-alone modules:

  • path (3) - [M_path] OOP interface for a GNU Linux or Unix pathname

  • splitpath (3) - [M_io] split a Unix pathname into components

  • M_list (3) - [M_list] maintain simple lists

  • insert (3) - [M_list] insert entry into a string array at specified position

  • locate (3) - [M_list] finds the index where a string is found or should be in a sorted array

  • remove (3) - [M_list] remove entry from an allocatable array at specified position

  • replace (3) - [M_list] replace entry in a string array at specified position

  • amatch (3) - [M_match] - look for pattern matching regular expression; returns its location

  • match (3) - [M_match] find match anywhere on line

  • omatch (3) - [M_match] try to match a single pattern at pat(j)

  • M_regex (3) - [M_regex] Fortran interface to POSIX 1003.2 regular expression library using ISO_C_BINDING.

  • regcomp (3) - [M_regex] Compile a regular expression into a regex object

  • regerror (3) - [M_regex] maps a non-zero errcode from either regcomp(3) or regexec(3) to a human-readable, printable message.

  • regexec (3) - [M_regex] Execute a compiled regex against a string

  • regfree (3) - [M_regex] Release storage used by the internal form of the RE (Regular Expression)

  • regmatch (3) - [M_regex] return selected substring defined by the MATCHES(2, :slight_smile: array

  • regsub (3) - [M_regex] perform regex substitutions

  • describe (3) - [M_strings] returns a string describing the name of a single character

  • msg (3) - [M_strings] converts any standard scalar type to a string

  • rotate13 (3) - [M_strings] apply trivial ROT13 encryption to a string

  • c2s (3) - [M_strings:ARRAY] convert C string pointer to Fortran character string

  • s2c (3) - [M_strings:ARRAY] convert character variable to array of characters with last element set to null

  • switch (3) - [M_strings:ARRAY] converts between CHARACTER scalar and array of single characters

  • base (3) - [M_strings:BASE] convert whole number string in base [2-36] to string in alternate base [2-36]

  • codebase (3) - [M_strings:BASE] convert whole number in base 10 to string in base [2-36]

  • decodebase (3) - [M_strings:BASE] convert whole number string in base [2-36] to base 10 number

  • lower (3) - [M_strings:CASE] changes a string to lowercase over specified range

  • upper (3) - [M_strings:CASE] changes a string to uppercase

  • upper_quoted (3) - [M_strings:CASE] elemental function converts string to miniscule skipping strings quoted per Fortran syntax r
    ules

  • isalnum (3) - [M_strings:COMPARE] test membership in subsets of ASCII set

  • matchw (3) - [M_strings:COMPARE] compare given string for match to pattern which may contain wildcard characters

  • change (3) - [M_strings:EDITING] change old string to new string with a directive like a line editor

  • join (3) - [M_strings:EDITING] append CHARACTER variable array into a single CHARACTER variable with specified separator

  • modif (3) - [M_strings:EDITING] emulate the MODIFY command from the line editor XEDIT

  • replace (3) - [M_strings:EDITING] function globally replaces one substring for another in string

  • reverse (3) - [M_strings:EDITING] Return a string reversed

  • substitute (3) - [M_strings:EDITING] subroutine globally substitutes one substring for another in string

  • transliterate (3) - [M_strings:EDITING] replace characters from old set with new set

  • M_strings (3) - [M_strings:INTRO] Fortran string modu+ len_white (3) - [M_strings:LENGTH] get length of string trimmed of whitespace.

  • lenset (3) - [M_strings:LENGTH] return string trimmed or padded to specified length

  • merge_str (3) - [M_strings:LENGTH] pads strings to same length and then calls MERGE(3f)

  • stretch (3) - [M_strings:LENGTH] return string padded to at least specified length

  • expand (3) - [M_strings:NONALPHA] expand C-like escape sequences

  • noesc (3) - [M_strings:NONALPHA] convert non-printable characters to a space.

  • notabs (3) - [M_strings:NONALPHA] expand tab characters

  • visible (3) - [M_strings:NONALPHA] expand a string to control and meta-control representations

  • getvals (3) - [M_strings:NUMERIC] read arbitrary number of REAL values from a character variable up to size of VALUES() array

  • isnumber (3) - [M_strings:NUMERIC] determine if a string represents a number

  • listout (3) - [M_strings:NUMERIC] expand a list of numbers where negative numbers denote range ends (1 -10 means 1 thru 10)

  • s2v (3) - [M_strings:NUMERIC] function returns doubleprecision numeric value from a string

  • s2vs (3) - [M_strings:NUMERIC] given a string representing numbers return a numeric array

  • string_to_value (3) - [M_strings:NUMERIC] subroutine returns numeric value from string

  • string_to_values (3) - [M_strings:NUMERIC] read a string representing numbers into a numeric array

  • v2s (3) - [M_strings:NUMERIC] return numeric string from a numeric value

  • value_to_string (3) - [M_strings:NUMERIC] return numeric string from a numeric value

  • quote (3) - [M_strings:QUOTES] add quotes to string as if written with list-directed input

  • unquote (3) - [M_strings:QUOTES] remove quotes from string as if read with list-directed input

  • chomp (3) - [M_strings:TOKENS] Tokenize a string, consuming it one token per call

  • delim (3) - [M_strings:TOKENS] parse a string and store tokens into an array

  • fmt (3) - [M_strings:TOKENS] convert text to a paragraph

  • split (3) - [M_strings:TOKENS] parse string into an array using specified delimiters

  • strtok (3) - [M_strings:TOKENS] Tokenize a string

  • adjustc (3) - [M_strings:WHITESPACE] center text

  • compact (3) - [M_strings:WHITESPACE] converts contiguous whitespace to a single character (or nothing)

  • crop (3) - [M_strings:WHITESPACE] trim leading blanks and trailing blanks from a string

  • indent (3) - [M_strings:WHITESPACE] count number of leading spaces in a string

  • nospace (3) - [M_strings:WHITESPACE] remove all whitespace from input string

There are also the intrinsic routines and operators such as “//” to keep in mind:

  • achar (3) - [FORTRAN:INTRINSIC:CHARACTER] returns a character in a specified position in the ASCII collating sequence
  • adjustl (3) - [FORTRAN:INTRINSIC:CHARACTER] Left adjust a string
  • adjustr (3) - [FORTRAN:INTRINSIC:CHARACTER] Right adjust a string
  • char (3) - [FORTRAN:INTRINSIC:CHARACTER] Character conversion function
  • iachar (3) - [FORTRAN:INTRINSIC:CHARACTER] Code in ASCII collating sequence
  • ichar (3) - [FORTRAN:INTRINSIC:CHARACTER] Character-to-integer conversion function
  • index (3) - [FORTRAN:INTRINSIC:CHARACTER] Position of a substring within a string
  • len (3) - [FORTRAN:INTRINSIC:CHARACTER] Length of a character entity
  • len_trim (3) - [FORTRAN:INTRINSIC:CHARACTER] Length of a character entity without trailing blank characters
  • lge (3) - [FORTRAN:INTRINSIC:CHARACTER] Lexical greater than or equal
  • lgt (3) - [FORTRAN:INTRINSIC:CHARACTER] Lexical greater than
  • lle (3) - [FORTRAN:INTRINSIC:CHARACTER] Lexical less than or equal
  • llt (3) - [FORTRAN:INTRINSIC:CHARACTER] Lexical less than
  • new_line (3) - [FORTRAN:INTRINSIC:CHARACTER] New line character
  • repeat (3) - [FORTRAN:INTRINSIC:CHARACTER] Repeated string concatenation
  • scan (3) - [FORTRAN:INTRINSIC:CHARACTER] Scan a string for the presence of a set of characters
  • trim (3) - [FORTRAN:INTRINSIC:CHARACTER] Remove trailing blank characters of a string
  • verify (3) - [FORTRAN:INTRINSIC:CHARACTER] Scan a string for the absence of a set of characters

So I think a big discussion point should be looking at the current state of the art and what is out there
(I gave my sample above, I am sure there are others) and more importantly should stdlib be the first place to try to determine what string functions are needed?

I personally think most interfaces like this should ask for people to post their solutions individually or jointly as github packages that the community can try out and donate to. That is more typical of most language ecosystems. Put a bunch of libraries out there listed as “packages” (implying some vetting and sponsorship exists for them) and let people try them and then if and when a single package dominates incorporate it into the standard. This is especially true when so much existing art exists for a topic. Trying to do it all in stdlib has some of the very same problems (VERY ironically) as the standards committee path has – it takes too much concensus from too few people, for example). A “package” approach seems to work much faster with other languages. I bet I have seen a dozen Python command line parsers; but it seems to have wittled down to just about three in common use. But if someone came up with a better one it would still be able to be available as a package and develop a following without being stopped in it’s tracks by not being “standard”.

Just how do you set up a poll anyway?

Please join our call, let’s discuss it. Our fpm effort will allow precisely that approach to have multiple competing packages. So far, we only put in things for which there was an anonymous consensus. And only in experimental, meaning we can still remove it if it turns out it’s not a good API.

For strings there might not be that many approaches out there, and we might be able to agree on a common functionality as a community. If we cannot agree, then having competing packages would be the way to go.

Unfortunately I have a scheduling conflict that prevents me from joining the meeting; which is why I threw together (hopefully not too hastily) that rather long list. I wish this posting tool had a vim(1) mode or at least a vi(1) mode for making posts. I find it painful to use so far, but am getting used to it.

I have been trying fpm and think it and a centralized listing of packages could be a game changer; although I hit some roadblocks regarding support of C wrappers and having to have each executable in a seperate subdirectory ( I keep meaning to send in my list of experiences on the fpm(1) site but have not had time).

I will be very interested in the meetings’ results, and am looking forward to some early results coming out of it.

There is no reason Fortran itself should not be able to meet and surpass the goals of Julia in my opinion; but without efforts like this succeeding I not sure that will happen.

Yes, Julia has been my inspiration to help start all these efforts. I think we will ultimately be successful with turning Fortran around.

I have been sifting through the comp.lang.fortran forum reading the previous discussions on string handling. Just on the topic of case conversion I found all of the following threads:

The question also appears on other forums:

Examples of the same case conversion function are given in at least 3 books:

  • Akin, E. (2003). Object-oriented programming via Fortran 90/95 (Vol. 1). Cambridge University Press. (page 301)
  • Ray, S. (2019). Fortran 2018 with Parallel Programming . CRC Press. (page 79)
  • Hahn, B. (1994). Fortran 90 for scientists and engineers . Elsevier. (page 168)

Not to mention all the the Fortran libraries which have rolled their own versions:

I’m sure we could find several more copies of the same routine in the list of popular open source projects. This is in addition to all the string packages already listed on the Fortran stdlib github issue for strings.

Digging back further I found that two character intrinsic case conversion functions were proposed for standardization in 1988 by the Dutch Fortran group (Leo ter Haar) in paper N289 (see under WG5 Documents). From the minutes of the September 1988 WG5 meeting in Paris (N353) we read:

The Dutch group have proposed the addition of two character intrinsic functions for the conversion of a character entity to upper-case or lower-case characters respectively. These functions are often providcd by vendors but in a non-standard, hence, non-portable, way.

It was noted that this functionality can be provided by a more general TRANSLATE using labels which is being worked on by X3J3.

An informal report by Miles Ellis from the same meeting appears to indicate the proposed functions would be called UC and LC.

Agreeing on a name for these two routines among the numerous choices:

  • ucase,lcase
  • upcase,downcase
  • upstr,lostr
  • uppercase,lowercase
  • uc,lc
  • lower , upper (Python)
  • upcase , downcase (Ruby)
  • toLower , toLowerInPlace , toUpper , toUpperInPlace ,asLowerCase , asUpperCase (D)
  • lower , upper (MATLAB)
  • uppercase , lowercase (Julia)
  • to_upper , to_lower ( C++ using Boost)
  • to_uppercase , to_ascii_uppercase , to_lowercase , to_ascii_lowercase , make_ascii_uppercase , make_ascii_lowercase (Rust)

is already a challenge. Once you bring in implementation differences, special behaviors, etc. what seemed like an easy-to-write function, can become a tough problem.

In the same 1988 Paris meeting report I read the German DIN group already proposed a data type for a “character string of varying length”. This type had by then already been available in the Siemens Fortran compiler for ten years!

In a 1984 article, Has Fortran a future?, Michael Metcalf suggested the following string type (using parametrized derived types!):

MODULE String_type
  TYPE String(Maxlen)
     INTEGER::Length
     CHARACTER(LEN=Maxlen)::String_data
  END TYPE String
END MODULE String_type

36 years later, we have the iso_varying_string which did not get accepted into the standard (probably best to leave the “iso” out know).

The best we can do as the user community is to agree on a useful solution ourselves and try and stick with it, until it gains enough acceptance to become de facto standard. The Boost C++ libraries also work this way.

5 Likes

IMO the key purpose of discussing strings for stdlib right now is in the context of functionality needed to make other things–file system utilities, I/O, fpm, etc. This may or may not be part of stdlib, although I think that something as basic and commonly used as strings should be part of it.

Ideally we will have strings utilities with a commonly used API as part of stdlib, and a dozen or more strings libraries available via fpm.

For right now I would like us to at least discuss what is the strings functionality that we need to build other things.

1 Like
  1. Click on [+ New Topic] in the upper right corner on the main Discourse page
  2. In the new topic dialog, there is a little gear icon on the right. Click on the gear icon.
  3. Click on Build Poll
  4. You can have multiple polls in a single topic.

Thanks @ivanpribec for this great research.

As Milan said, I think fpm will allow to have dozens of string libraries and other libraries. We can experiment with things easily.

However, I still think it’s good to agree on the API as part of stdlib, and I think we actually can, and it would go into the stdlib_experimental. My idea is that it would stay there while fpm matures and we can create libraries easily, and say in a year or two, it would become clear what the subset is that everybody agrees we should have, and we can then move with it from experimental to main in stdlib.

So my approach is to be active and try to get something into experimental soon, so that we move beyond just discussing it, but actually having code that we can try using, which will move the conversation forward. But be very conservative when moving it to main, because we want to make sure that we “standardize” things that people actually want and agree with.

P.S. The other approach that I am really hoping would happen is that fpm would allow people to create competing libraries for strings and other functionality, and in a Darwinian fashion some of these will become popular, and we can then think to standardize some of the popular API. Similar to how Python accepted argparse into the standard library, as an example. It will take a year or two for this to happen, so for this reason I want to be very careful before moving anything from experimental to main in stdlib, and rather wait a bit.

5 Likes

I will attend the call and could present my ideas on file system operations. Not sure whether it makes sense to discuss the more general concept of strings before or after that.

2 Likes

@ivanpribec, this is excellent compilation of “prior art” in this area.

Here’re a couple of other links that had caught my attention on comp.lang.fortran:

It will be useful if any design of such a type keeps also in mind the needs expressed here.

All,

My own opinion and vision for the Fortran language remains a new intrinsic and inextensible type with a short name, preferably named string or perhaps string_t, that is designed with the Pareto Principle in mind to meet most of the needs of general coding right “out of the box”:

   string, allocatable :: words(:)
   string :: s
   words = [ "Hello", "World!" ]
   s = words(2)(1:5)
   print *, s       !<--  prints "World"
   ..

Using equivalent types in certain other languages popular in scientific and technical computing such as C++ std::string, Java string class, Microsoft .NET StringBuilder, Swift String, etc. - all of whom do appear to judiciously limit the scope of the contained facility in their type rather than try to pack everything of interest - as a guide can help with eventual standardization in Fortran.

5 Likes

I also wish an intrinsic type to get maximum efficiency from compilers (ultimately)… As for assignment, I think the right-hand side of this line

words = [ “Hello”, “World!” ]

is probably evaluated independently from the left-hand side, so resulting in an error. We may be able to use [ string("Hello"), string("World!") ] or [ string :: "Hello", "World!" ], but I guess if we have something like a “s-string” or “v-string” syntax similar to “f-string” in Python etc (here, to make a literal of the new string type), it may offer additional functionalities (e.g., string interpolation??).

words = [ s"Hello", s"World!" ] !! s-string
words = [ v"Hello", v"World!" ] !! v-string (v for “variable-length”)
words = [ v"Hello", “World!” ]
!! the 1st element determines the type of array, so the 2nd character string is
!! auto-converted to v-string

s"foo = {foo}" !! string interpolation
v"baa = $baa / $(baz * 10)" !! string interpolation

But I guess this kind of thing needs compiler support…

1 Like

From what I’ve seen when dealing with strings and filesystem objects/command line arguments in Python, they are somewhat tied, depending on what kind of string you are talking about (byte-like or decoded). It seems to be tricky to get this right, as some blog posts covering this issue show:

So, from Fortrans high-level focus I would assume that we want something like Python 3/Julia type of strings (e.g. UTF-8 by default, automatically decoded from local encoding), but given Pythons history, it seems to me that a somewhat holistic approach is needed indeed.

4 Likes

Quoting from Minutes of the Fortran Experts Group Meeting at Vienna, 14-17 June 1982:

Character Data Type

Discussion:

Meissner: What we should have had is STRING.

Pollicini: You can do what you want with arrays of characters. Why have two different character types - fixed and varying?

Snoek: Only one should go in core. Why put fixed length character in core?
Ans: It is more efficient.

Metcalf: Fortran 77 character type is a mistake, but we should make it better by extension (character type with a varying attribute). We should not have two separate types.

Meissner: The issue is what happens across subprogram boundaries.

1 Like

I’d like to suggest a couple of items for discussion today relating to fpm:

  • Initial experiences & feedback: it’s great to see some people trying out fpm with their projects, we should review these in order to guide the next steps. Specifically what works well, what does not and any limitations/show-stoppers;

  • Next steps:

    • which features should be prioritised next?
    • what is the milestone for starting re-implementation in Fortran?
3 Likes

I agree. I will argue for starting the Fortran implementation as soon as possible, so that the community can start submitting PRs.

1 Like