Fortran Monthly Call: July 2020

lkedward · July 7, 2020, 3:10pm

It’s approaching time for our next monthly call, which will be in the week of July 13-17; please see the following doodle poll to mark your availability:

The final time slot will be selected and announced on the morning (European time) of Monday July 13, please complete the poll before then.

If there are specific issues you would like to discuss, please reply to this message.
As with the June call, we will record this call and make it available online for those who are unable to attend.

All the best,

Laurence

milancurcic · July 9, 2020, 4:05pm

Thank you Laurence. I suggest two items to be included in the discussion:

Code organization / structure of stdlib, background in this thread.
How can we move forward with strings in stdlib, needed for the development of other things like file system utilities and fpm. Originally suggested by @MarDie.

ivanpribec · July 10, 2020, 9:30am

I like the string suggestion. In fact most of the string handling routines necessary already exists but are scattered throughout several project (likely with incompatible licenses).

I think in the string issue in stdlib we agreed to have both functions/subroutines which operate on the intrinsic fixed-length character strings, and a string type, (perhaps) based upon the iso_varying_string proposal.

It would be great to also learn the reasons why common things like integer to string conversions and vice-versa have not been standardized or provided as an extension module.

lkedward · July 13, 2020, 2:39pm

(!) IMPORTANT - CHANGE OF SCHEDULE - Please update your calendars:

Further to my previous post, our next monthly call will be Thursday, July 16 ~~Friday, July 17~~ at 7pm BST.

11:00 - 12:00 PT (California)
14:00 - 15:00 EDT (New York)
19:00 - 20:00 BST (London)
20:00 - 21:00 CEST (Central Europe)

As Milan has done previously, I will put a reminder one hour before the meeting starts.
Please continue to use this thread for topics you would like to discuss at the meeting.

All the best,

Laurence

Laurence Kedward is inviting you to a scheduled Zoom meeting.

Topic: Fortran Monthly Call: July 2020
Time: Jul 16, 2020 07:00 PM London

Join Zoom Meeting

Meeting ID: 936 7756 6475
One tap mobile
+442080806591,93677566475# United Kingdom
+442080806592,93677566475# United Kingdom

Dial by your location
+44 208 080 6591 United Kingdom
+44 208 080 6592 United Kingdom
+44 330 088 5830 United Kingdom
+44 131 460 1196 United Kingdom
+44 203 481 5237 United Kingdom
+44 203 481 5240 United Kingdom
+44 203 901 7895 United Kingdom
Meeting ID: 936 7756 6475
Find your local number: https://zoom.us/u/adidSBZBSb

Join by SIP
93677566475@zoomcrc.com

Join by H.323
162.255.37.11 (US West)
162.255.36.11 (US East)
115.114.131.7 (India Mumbai)
115.114.115.7 (India Hyderabad)
213.19.144.110 (EMEA)
103.122.166.55 (Australia)
209.9.211.110 (Hong Kong SAR)
64.211.144.160 (Brazil)
69.174.57.160 (Canada)
207.226.132.110 (Japan)
Meeting ID: 936 7756 6475

urbanjost · July 14, 2020, 12:53am

Some fuel for discussing strings:

Perhaps build a poll asking the importance of different classes of string routines?

Some major questions are the importance of

OOP versus procedural interfaces or both
ANSI character set versus Unicode
ISO_VARYING_STRING or other extended types versus the intrinsic CHARACTER type

I find having basic string functions for splitting strings on delimiters, case conversion, and
converting between numeric and string functions and handling white-space the most essential.

Some example stand-alone modules that are licensed as public domain that could fuel discussion are

M_msg # general routine for scalar values to string
M_strings # basic string functions in a procedure and OOP style
M_calculator # expression parsing
M_change # basic regular expressions

Related procedures that have dependencies are in

GPF (General Purpose Fortran)

  * M_path      - OOP interface for a GNU/Linux or Unix (ie. "Posix") pathname
  * M_bre       - Basic Regular Expressions
  * M_regex     - Fortran interface to POSIX 1003.2 regular expression library using ISO_C_BINDING.

  To a lesser extent (need better examples here):
  
     * M_list      - maintain simple lists

MISCELLANEOUS

Is parsing command line arguments part of this discussion? It is largely the act of parsing a “command” string?

RELATED MAN PAGES

The Stand-alone modules referenced above have their own documention, but I think the following
routines as described in GPF manpages
are pertinent and also cover almost all the stand-alone modules:

path (3) - [M_path] OOP interface for a GNU Linux or Unix pathname
splitpath (3) - [M_io] split a Unix pathname into components
M_list (3) - [M_list] maintain simple lists
insert (3) - [M_list] insert entry into a string array at specified position
locate (3) - [M_list] finds the index where a string is found or should be in a sorted array
remove (3) - [M_list] remove entry from an allocatable array at specified position
replace (3) - [M_list] replace entry in a string array at specified position
amatch (3) - [M_match] - look for pattern matching regular expression; returns its location
match (3) - [M_match] find match anywhere on line
omatch (3) - [M_match] try to match a single pattern at pat(j)
M_regex (3) - [M_regex] Fortran interface to POSIX 1003.2 regular expression library using ISO_C_BINDING.
regcomp (3) - [M_regex] Compile a regular expression into a regex object
regerror (3) - [M_regex] maps a non-zero errcode from either regcomp(3) or regexec(3) to a human-readable, printable message.
regexec (3) - [M_regex] Execute a compiled regex against a string
regfree (3) - [M_regex] Release storage used by the internal form of the RE (Regular Expression)
regmatch (3) - [M_regex] return selected substring defined by the MATCHES(2, array
regsub (3) - [M_regex] perform regex substitutions
describe (3) - [M_strings] returns a string describing the name of a single character
msg (3) - [M_strings] converts any standard scalar type to a string
rotate13 (3) - [M_strings] apply trivial ROT13 encryption to a string
c2s (3) - [M_strings:ARRAY] convert C string pointer to Fortran character string
s2c (3) - [M_strings:ARRAY] convert character variable to array of characters with last element set to null
switch (3) - [M_strings:ARRAY] converts between CHARACTER scalar and array of single characters
base (3) - [M_strings:BASE] convert whole number string in base [2-36] to string in alternate base [2-36]
codebase (3) - [M_strings:BASE] convert whole number in base 10 to string in base [2-36]
decodebase (3) - [M_strings:BASE] convert whole number string in base [2-36] to base 10 number
lower (3) - [M_strings:CASE] changes a string to lowercase over specified range
upper (3) - [M_strings:CASE] changes a string to uppercase
upper_quoted (3) - [M_strings:CASE] elemental function converts string to miniscule skipping strings quoted per Fortran syntax r
ules
isalnum (3) - [M_strings:COMPARE] test membership in subsets of ASCII set
matchw (3) - [M_strings:COMPARE] compare given string for match to pattern which may contain wildcard characters
change (3) - [M_strings:EDITING] change old string to new string with a directive like a line editor
join (3) - [M_strings:EDITING] append CHARACTER variable array into a single CHARACTER variable with specified separator
modif (3) - [M_strings:EDITING] emulate the MODIFY command from the line editor XEDIT
replace (3) - [M_strings:EDITING] function globally replaces one substring for another in string
reverse (3) - [M_strings:EDITING] Return a string reversed
substitute (3) - [M_strings:EDITING] subroutine globally substitutes one substring for another in string
transliterate (3) - [M_strings:EDITING] replace characters from old set with new set
M_strings (3) - [M_strings:INTRO] Fortran string modu+ len_white (3) - [M_strings:LENGTH] get length of string trimmed of whitespace.
lenset (3) - [M_strings:LENGTH] return string trimmed or padded to specified length
merge_str (3) - [M_strings:LENGTH] pads strings to same length and then calls MERGE(3f)
stretch (3) - [M_strings:LENGTH] return string padded to at least specified length
expand (3) - [M_strings:NONALPHA] expand C-like escape sequences
noesc (3) - [M_strings:NONALPHA] convert non-printable characters to a space.
notabs (3) - [M_strings:NONALPHA] expand tab characters
visible (3) - [M_strings:NONALPHA] expand a string to control and meta-control representations
getvals (3) - [M_strings:NUMERIC] read arbitrary number of REAL values from a character variable up to size of VALUES() array
isnumber (3) - [M_strings:NUMERIC] determine if a string represents a number
listout (3) - [M_strings:NUMERIC] expand a list of numbers where negative numbers denote range ends (1 -10 means 1 thru 10)
s2v (3) - [M_strings:NUMERIC] function returns doubleprecision numeric value from a string
s2vs (3) - [M_strings:NUMERIC] given a string representing numbers return a numeric array
string_to_value (3) - [M_strings:NUMERIC] subroutine returns numeric value from string
string_to_values (3) - [M_strings:NUMERIC] read a string representing numbers into a numeric array
v2s (3) - [M_strings:NUMERIC] return numeric string from a numeric value
value_to_string (3) - [M_strings:NUMERIC] return numeric string from a numeric value
quote (3) - [M_strings:QUOTES] add quotes to string as if written with list-directed input
unquote (3) - [M_strings:QUOTES] remove quotes from string as if read with list-directed input
chomp (3) - [M_strings:TOKENS] Tokenize a string, consuming it one token per call
delim (3) - [M_strings:TOKENS] parse a string and store tokens into an array
fmt (3) - [M_strings:TOKENS] convert text to a paragraph
split (3) - [M_strings:TOKENS] parse string into an array using specified delimiters
strtok (3) - [M_strings:TOKENS] Tokenize a string
adjustc (3) - [M_strings:WHITESPACE] center text
compact (3) - [M_strings:WHITESPACE] converts contiguous whitespace to a single character (or nothing)
crop (3) - [M_strings:WHITESPACE] trim leading blanks and trailing blanks from a string
indent (3) - [M_strings:WHITESPACE] count number of leading spaces in a string
nospace (3) - [M_strings:WHITESPACE] remove all whitespace from input string

There are also the intrinsic routines and operators such as “//” to keep in mind:

achar (3) - [FORTRAN:INTRINSIC:CHARACTER] returns a character in a specified position in the ASCII collating sequence
adjustl (3) - [FORTRAN:INTRINSIC:CHARACTER] Left adjust a string
adjustr (3) - [FORTRAN:INTRINSIC:CHARACTER] Right adjust a string
char (3) - [FORTRAN:INTRINSIC:CHARACTER] Character conversion function
iachar (3) - [FORTRAN:INTRINSIC:CHARACTER] Code in ASCII collating sequence
ichar (3) - [FORTRAN:INTRINSIC:CHARACTER] Character-to-integer conversion function
index (3) - [FORTRAN:INTRINSIC:CHARACTER] Position of a substring within a string
len (3) - [FORTRAN:INTRINSIC:CHARACTER] Length of a character entity
len_trim (3) - [FORTRAN:INTRINSIC:CHARACTER] Length of a character entity without trailing blank characters
lge (3) - [FORTRAN:INTRINSIC:CHARACTER] Lexical greater than or equal
lgt (3) - [FORTRAN:INTRINSIC:CHARACTER] Lexical greater than
lle (3) - [FORTRAN:INTRINSIC:CHARACTER] Lexical less than or equal
llt (3) - [FORTRAN:INTRINSIC:CHARACTER] Lexical less than
new_line (3) - [FORTRAN:INTRINSIC:CHARACTER] New line character
repeat (3) - [FORTRAN:INTRINSIC:CHARACTER] Repeated string concatenation
scan (3) - [FORTRAN:INTRINSIC:CHARACTER] Scan a string for the presence of a set of characters
trim (3) - [FORTRAN:INTRINSIC:CHARACTER] Remove trailing blank characters of a string
verify (3) - [FORTRAN:INTRINSIC:CHARACTER] Scan a string for the absence of a set of characters

urbanjost · July 14, 2020, 1:15am

So I think a big discussion point should be looking at the current state of the art and what is out there
(I gave my sample above, I am sure there are others) and more importantly should stdlib be the first place to try to determine what string functions are needed?

I personally think most interfaces like this should ask for people to post their solutions individually or jointly as github packages that the community can try out and donate to. That is more typical of most language ecosystems. Put a bunch of libraries out there listed as “packages” (implying some vetting and sponsorship exists for them) and let people try them and then if and when a single package dominates incorporate it into the standard. This is especially true when so much existing art exists for a topic. Trying to do it all in stdlib has some of the very same problems (VERY ironically) as the standards committee path has – it takes too much concensus from too few people, for example). A “package” approach seems to work much faster with other languages. I bet I have seen a dozen Python command line parsers; but it seems to have wittled down to just about three in common use. But if someone came up with a better one it would still be able to be available as a package and develop a following without being stopped in it’s tracks by not being “standard”.

Just how do you set up a poll anyway?

certik · July 14, 2020, 3:16am

Please join our call, let’s discuss it. Our fpm effort will allow precisely that approach to have multiple competing packages. So far, we only put in things for which there was an anonymous consensus. And only in experimental, meaning we can still remove it if it turns out it’s not a good API.

For strings there might not be that many approaches out there, and we might be able to agree on a common functionality as a community. If we cannot agree, then having competing packages would be the way to go.

urbanjost · July 14, 2020, 4:24am

Unfortunately I have a scheduling conflict that prevents me from joining the meeting; which is why I threw together (hopefully not too hastily) that rather long list. I wish this posting tool had a vim(1) mode or at least a vi(1) mode for making posts. I find it painful to use so far, but am getting used to it.

I have been trying fpm and think it and a centralized listing of packages could be a game changer; although I hit some roadblocks regarding support of C wrappers and having to have each executable in a seperate subdirectory ( I keep meaning to send in my list of experiences on the fpm(1) site but have not had time).

I will be very interested in the meetings’ results, and am looking forward to some early results coming out of it.

There is no reason Fortran itself should not be able to meet and surpass the goals of Julia in my opinion; but without efforts like this succeeding I not sure that will happen.

certik · July 14, 2020, 4:48am

Yes, Julia has been my inspiration to help start all these efforts. I think we will ultimately be successful with turning Fortran around.

ivanpribec · July 14, 2020, 10:56am

I have been sifting through the comp.lang.fortran forum reading the previous discussions on string handling. Just on the topic of case conversion I found all of the following threads:

case insensitive string comparison (30/09/2013)
Case conversion without the loopiness (14/01/2009)
Character case conversion (24/10/2007)
what does this do? UPSHIFT(I:I) = CHAR (ICHAR (STRING(I:I) ) - 40B) (18/05/2006)
Lowercase (18/06/2005)
How to convert uppercase to lowercase in fortran (15/03/2005)
uppercase (12/03/2003)
Character Functions (15/01/2002)
string → upper case using function to_upper (19/12/2001)
String Handling Module (21/04/2001)
uppercasing a string (16/05/2000)
Upper-case conversion in one statement? (07/05/1999)
matching character expressions, ignoring chase (18/06/1998)
Novice question: Simplest way to convert string to all Uppor or Lower Case characters? (06/05/1999)
Charaters (18/07/1998)
case conversion (23/03/1993)

The question also appears on other forums:

[StackOverflow] How can I write a to_upper() or to_lower() function in F90? (25 May 2012)
[Intel Forum] scan string for uper/lower case (04-24-2018)
[Computer programming forum] How to convert uppercase to lowercase in fortran (02 Sep 2007)

Examples of the same case conversion function are given in at least 3 books:

Akin, E. (2003). Object-oriented programming via Fortran 90/95 (Vol. 1). Cambridge University Press. (page 301)
Ray, S. (2019). Fortran 2018 with Parallel Programming . CRC Press. (page 79)
Hahn, B. (1994). Fortran 90 for scientists and engineers . Elsevier. (page 168)

Not to mention all the the Fortran libraries which have rolled their own versions:

I’m sure we could find several more copies of the same routine in the list of popular open source projects. This is in addition to all the string packages already listed on the Fortran stdlib github issue for strings.

Digging back further I found that two character intrinsic case conversion functions were proposed for standardization in 1988 by the Dutch Fortran group (Leo ter Haar) in paper N289 (see under WG5 Documents). From the minutes of the September 1988 WG5 meeting in Paris (N353) we read:

The Dutch group have proposed the addition of two character intrinsic functions for the conversion of a character entity to upper-case or lower-case characters respectively. These functions are often providcd by vendors but in a non-standard, hence, non-portable, way.

It was noted that this functionality can be provided by a more general TRANSLATE using labels which is being worked on by X3J3.

An informal report by Miles Ellis from the same meeting appears to indicate the proposed functions would be called UC and LC.

Agreeing on a name for these two routines among the numerous choices:

ucase,lcase
upcase,downcase
upstr,lostr
uppercase,lowercase
uc,lc
lower , upper (Python)
upcase , downcase (Ruby)
toLower , toLowerInPlace , toUpper , toUpperInPlace ,asLowerCase , asUpperCase (D)
lower , upper (MATLAB)
uppercase , lowercase (Julia)
to_upper , to_lower ( C++ using Boost)
to_uppercase , to_ascii_uppercase , to_lowercase , to_ascii_lowercase , make_ascii_uppercase , make_ascii_lowercase (Rust)

is already a challenge. Once you bring in implementation differences, special behaviors, etc. what seemed like an easy-to-write function, can become a tough problem.

In the same 1988 Paris meeting report I read the German DIN group already proposed a data type for a “character string of varying length”. This type had by then already been available in the Siemens Fortran compiler for ten years!

In a 1984 article, Has Fortran a future?, Michael Metcalf suggested the following string type (using parametrized derived types!):

MODULE String_type
  TYPE String(Maxlen)
     INTEGER::Length
     CHARACTER(LEN=Maxlen)::String_data
  END TYPE String
END MODULE String_type

36 years later, we have the iso_varying_string which did not get accepted into the standard (probably best to leave the “iso” out know).

The best we can do as the user community is to agree on a useful solution ourselves and try and stick with it, until it gains enough acceptance to become de facto standard. The Boost C++ libraries also work this way.

milancurcic · July 14, 2020, 3:48pm

IMO the key purpose of discussing strings for stdlib right now is in the context of functionality needed to make other things–file system utilities, I/O, fpm, etc. This may or may not be part of stdlib, although I think that something as basic and commonly used as strings should be part of it.

Ideally we will have strings utilities with a commonly used API as part of stdlib, and a dozen or more strings libraries available via fpm.

For right now I would like us to at least discuss what is the strings functionality that we need to build other things.

milancurcic · July 14, 2020, 3:51pm

Click on [+ New Topic] in the upper right corner on the main Discourse page
In the new topic dialog, there is a little gear icon on the right. Click on the gear icon.
Click on Build Poll
You can have multiple polls in a single topic.

certik · July 14, 2020, 3:57pm

Thanks @ivanpribec for this great research.

As Milan said, I think fpm will allow to have dozens of string libraries and other libraries. We can experiment with things easily.

However, I still think it’s good to agree on the API as part of stdlib, and I think we actually can, and it would go into the stdlib_experimental. My idea is that it would stay there while fpm matures and we can create libraries easily, and say in a year or two, it would become clear what the subset is that everybody agrees we should have, and we can then move with it from experimental to main in stdlib.

So my approach is to be active and try to get something into experimental soon, so that we move beyond just discussing it, but actually having code that we can try using, which will move the conversation forward. But be very conservative when moving it to main, because we want to make sure that we “standardize” things that people actually want and agree with.

P.S. The other approach that I am really hoping would happen is that fpm would allow people to create competing libraries for strings and other functionality, and in a Darwinian fashion some of these will become popular, and we can then think to standardize some of the popular API. Similar to how Python accepted argparse into the standard library, as an example. It will take a year or two for this to happen, so for this reason I want to be very careful before moving anything from experimental to main in stdlib, and rather wait a bit.

MarDie · July 14, 2020, 10:31pm

I will attend the call and could present my ideas on file system operations. Not sure whether it makes sense to discuss the more general concept of strings before or after that.

FortranFan · July 15, 2020, 1:16pm

@ivanpribec, this is excellent compilation of “prior art” in this area.

Here’re a couple of other links that had caught my attention on comp.lang.fortran:

Yet another Fortran string library (11 May 2016)
Fortran 2003 variant of ISO_VARYING_STRING (31 March 2016)

It will be useful if any design of such a type keeps also in mind the needs expressed here.

All,

My own opinion and vision for the Fortran language remains a new intrinsic and inextensible type with a short name, preferably named string or perhaps string_t, that is designed with the Pareto Principle in mind to meet most of the needs of general coding right “out of the box”:

   string, allocatable :: words(:)
   string :: s
   words = [ "Hello", "World!" ]
   s = words(2)(1:5)
   print *, s       !<--  prints "World"
   ..

Using equivalent types in certain other languages popular in scientific and technical computing such as C++ std::string, Java string class, Microsoft .NET StringBuilder, Swift String, etc. - all of whom do appear to judiciously limit the scope of the contained facility in their type rather than try to pack everything of interest - as a guide can help with eventual standardization in Fortran.

septc · July 15, 2020, 2:04pm

I also wish an intrinsic type to get maximum efficiency from compilers (ultimately)… As for assignment, I think the right-hand side of this line

words = [ “Hello”, “World!” ]

is probably evaluated independently from the left-hand side, so resulting in an error. We may be able to use [ string("Hello"), string("World!") ] or [ string :: "Hello", "World!" ], but I guess if we have something like a “s-string” or “v-string” syntax similar to “f-string” in Python etc (here, to make a literal of the new string type), it may offer additional functionalities (e.g., string interpolation??).

words = [ s"Hello", s"World!" ] !! s-string
words = [ v"Hello", v"World!" ] !! v-string (v for “variable-length”)
words = [ v"Hello", “World!” ]
!! the 1st element determines the type of array, so the 2nd character string is
!! auto-converted to v-string

s"foo = {foo}" !! string interpolation
v"baa = $baa / $(baz * 10)" !! string interpolation

But I guess this kind of thing needs compiler support…

tiziano.mueller · July 16, 2020, 9:08am

From what I’ve seen when dealing with strings and filesystem objects/command line arguments in Python, they are somewhat tied, depending on what kind of string you are talking about (byte-like or decoded). It seems to be tricky to get this right, as some blog posts covering this issue show:

Python 3.2 Painful History of the Filesystem Encoding — Victor Stinner blog 3
A command line argument is raw binary data. It comes with limitations and needs interpretation. – Jan-Philip Gehrcke, PhD

So, from Fortrans high-level focus I would assume that we want something like Python 3/Julia type of strings (e.g. UTF-8 by default, automatically decoded from local encoding), but given Pythons history, it seems to me that a somewhat holistic approach is needed indeed.

ivanpribec · July 16, 2020, 2:51pm

Quoting from Minutes of the Fortran Experts Group Meeting at Vienna, 14-17 June 1982:

Character Data Type

Discussion:

Meissner: What we should have had is STRING.

Pollicini: You can do what you want with arrays of characters. Why have two different character types - fixed and varying?

Snoek: Only one should go in core. Why put fixed length character in core?
Ans: It is more efficient.

Metcalf: Fortran 77 character type is a mistake, but we should make it better by extension (character type with a varying attribute). We should not have two separate types.

Meissner: The issue is what happens across subprogram boundaries.

lkedward · July 16, 2020, 2:57pm

I’d like to suggest a couple of items for discussion today relating to fpm:

Initial experiences & feedback: it’s great to see some people trying out fpm with their projects, we should review these in order to guide the next steps. Specifically what works well, what does not and any limitations/show-stoppers;
Next steps:
- which features should be prioritised next?
- what is the milestone for starting re-implementation in Fortran?

certik · July 16, 2020, 3:17pm

I agree. I will argue for starting the Fortran implementation as soon as possible, so that the community can start submitting PRs.

Topic		Replies	Views
Fortran Monthly Call: June 2020 Announcements	12	1041	June 20, 2020
Fortran Monthly Call: October 2020 Announcements	10	1024	October 27, 2020
Fortran Monthly Call: February 2021 Announcements	12	1074	February 26, 2021
Fortran Monthly Call: September 2020 Announcements	10	854	September 25, 2020
Fortran Monthly Call: March 2021 Announcements	13	1266	March 26, 2021

Fortran Monthly Call: July 2020

Some fuel for discussing strings:

MISCELLANEOUS

RELATED MAN PAGES

Related topics