Thanks. Yes, that is indeed a very limited set of cases.
Fortranners can check back 20 years later - unless the Community has really rallied behind initiatives such as LFortran
and endeavored to implement facilities in the Fortran language entirely independently of WG5 and J3 and relegated the ISO / IEC standard development as merely an afterthought to simply and “officially” document something which is already standardized in actual practice due to vast community adoption, as it is with almost tools of the trade and associated methods and procedures in most industries - it will be the same exact situation as today.
Fortran 204Y may be up for discussion and 6 to 9 “wise guys”, the most influential on WG5 and J3 of whom do not engage directly with the community and stay in glass houses instead, will deem that basic facilities such as STRING
and BITS
types, which all other modern languages, coming up from scratch at a faster pace than the Fortran committees can work out the font aspects in the PDF document, offer their practitioners right from the get-go, as a matter of basic features its practitioners must have.
But not Fortran!! Keep on writing your own derived type variants around character(len=:), allocatable :: chars
and logical(LOGICAL_NN) :: bitdat
and what-not and “proudly” insert implicit none
everywhere!!
Seriously, what nonsense is the language development (or the lack thereof) up to?!
@certik wrote above: “Why don’t you come up with some good Fortran syntax for this? We can easily create a prototype for this, since we already did the hard work of making it working in the middle end and backends.”
This seems like a good way to make progress independently from J3. A complete implementation with all of the details worked out should be much easier to standardise. And even if it doesn’t go into the standard, it could become a commonly-implemented extension. There is no shortage of those in Fortran.
So all we need is a good specification of a new intrinsic STRING type that fits in with the rest of the language.
Blaming J3 is much easier.
That’s right. I think that’s the way to do it: a well designed extension, that many people here are asking for and would be happy to use. I am happy to supervise the implementation. But I need people here to help on the syntax and examples side and help design it and ensure that our implementation is working correctly (by testing it, reporting bugs, etc.).
I am relatively new to the Fortran standardization effort. The standard itself makes it a bit challenging to understand the consequences of potential changes.
- It is long. 2023 FDIS is 688 pages.
- It is dense.
- It has a very long history, and semantics discussions (recorded in the J3 papers) can be hard to come by or to analyze.
- There are sections where we define somewhat formally the expected processor or user code behavior, but mostly the tool of choice is English prose.
- I am unaware of tools for assessing the impacts of changes (beyond
grep
on the LaTeX source, and the gray matter of the committee members).
This week I learned a bit about how the ECMAScript (JavaScript) community updates their standard about every year. The current standard is 840 pages.
- Any early goal of the standard was for it to be machine-readable, to support tooling for the standard developers and compiler implementers.
- The standard can be translated into a formal model of expected behavior.
- That formal model can be used as input to other tools (e.g., standard checkers, test generators, program analysis tools for JavaScript programs).
- They have analysis tools that can evaluate program behavior against multiple versions of the standard. Like the behavior of their test suites.
- They have integrated these modeling tools into the standards continuous integration process. Failure to produce a model is usually a bug in the standard.
(Note, they don’t have ISO to deal with, either.)
It is really impressive.
The above seems to be the gist of this whole thread. It would be great to know why it is so.
As I have explained above, I will stress a few aspects of the desired facility including its semantics before diving deep into the syntax. But say for the sake of discussion the new type is named string
. Then
-
This new type shall be an intrinsic one and the standard semantics on intrinsic types shall apply to
string
starting with it not being an extensible type. -
This intrinsic type shall thus be declarable in any scope where other intrinsic types such as integer can be declared. Thus no
USE
statement shall be needed to import the type into a scope. -
string
type shall be as though it has a private component of the intrinsic typecharacter
, -
The processor shall support at least two KINDs of this `string| type: one KIND as though the component mentioned in 2 above is of default character; the second KIND as though the character component is of ISO 10646 set.
-
A convenient means to define a variable of this type using character literal constants shall be available.
-
A convenient means to construct an array of
string
type with character strings of same or different lengths shall be available, possibly like so:
string :: pets(3)
pets = [ string :: "dog", "pony", "turtle" ]
- The same means to access sections of character data as applicable to character intrinsic type including with arrays of character type shall be available to string e.g.,
string :: language
language = "Fortran"
print *, language(1:3) ! outputs "For"
string :: pets(3)
pets = [ string :: "dog", "pony", "turtle" ]
print *, pets(1)(1:2) ! outputs "do"
- Methods to operate on the character data of string data shall also be available as though they are type-bound. The list of methods shall be identified based on feedback from the Community and reviewed and developed in a workflow similar to Fortran
stdlib
. However the list shall include a method namedinsert
to introduce a string of provided character(s) at the specified positionpos
e.g.,
string dilemma
dilemma = "to be the question"
call dilemma%insert( pos=7, chars="or not to be is" )
I have a longer list of additional requirements based on use cases involving library solutions that have been consumed for such a type and which I can provide over time.
However I just wanted to get the ball rolling where you and the readers to review the above 8 items and see how it’s received, what are the comments and the feedback, what you think is the feasibility for implementation in LFortran
.
Thanks,
I agree 100% with all but point 8. Personally I’m not a huge fan of the type bound procedure syntax because it looks extremely similar to accessing a field of a derived type, and is thus confusing. Others may feel differently, so I’m not married to the idea, but I think leaving string intrinsics as subroutines accessed like normal would be better. I do not dispute that such routines need to exist along with the other aspects outlined for an intrinsic string type.
As an aside I have no opinion on the ISO kind character component, mostly because I have no idea what that gives me. Does it enable some Unicode values or something?
What is the advantage of this compared to the normal character syntax:
dilemma = dilemma(1:6) // "or not to be is" // dilemma(7:)
Me neither. But beyond personal preferences, the consistency with the rest of the language matters:
- all Fortran intrinsics are classical functions/subroutines, with few exceptions (e.g.
%re
and%im
, but these ones can be viewed as components) - a
string
type would be a kind of extension of thecharacter
type, hence the same functions/subroutines should apply (when applicable), with the same syntax.
I can think of situations where, instead of having to count characters and find the position of a substring where new text is to be inserted, as in
string dilemma
dilemma = "to be the question"
call dilemma%insert( pos=7, chars="or not to be is" )
it would be more convenient to be able to write
string dilemma
dilemma = "to be the question"
call dilemma%insert( after="be ", chars="or not to be is" )
I disagree entirely with the “more convenient” aspect.
But a generic interface to INSERT
procedure to allow for different options (modes), 1) at some specified position, 2) “after” something, etc. are worth strong consideration for the new STRING
type design that might follow a community-driven workflow similar to stdlib
.
Now, however, option 1 involving something like a POS
argument is my first preference with this INSERT
procedure. It is based on many use cases I have reviewed across quite a few codebases.
Could this be done with a replace() function instead?
I have edited my previous post on desired semantics to state, “Methods to operate on the character data of string data shall also be available as though they are type-bound.” Note the “also”.
There are a couple of reasons, mostly intended to serve my colleagues who tend to be polyglot, much younger, and for whom Fortran is often a nth (n > 3) programming language in terms of their learning order and further down in their preference. But toward many of the apps they work on, Fortran should rise up to be their lingua franca again in the not too distant future, that is the vision anyway.
- they are most familiar with the notion of a class for a string type, as opposed to the “raw”
char
type, that has methods which operate on the data of an instance of the class. In other words, the type-bound aspect is intuitive to them. - there already are library solutions in Fortran that some of them use which are similar to the
string_type
of Fortranstdlib
and which have type-bound procedures. A desire for a similar structure with TBPs in the eventual intrinsicstring
type has been expressed to me by several of them.
Outside of the traditional Fortran parlance, many coders are quite savvy about OO and know well the differences between data fields and methods. I think the risk of any confusion with a string intrinsic type and TBPs is quite low among the future generation of Fortranners, who come to Fortran with a strong background in these other paradigms.
A new topic should be opened to discuss the specifications of a string
type…
FD Admin(s): if it’s not too much effort, as suggested by @PierU , can you please move the discussions on the specifications on STRING
type to a new thread, perhaps starting with @certik’s leading post here?
Based on your experience and knowledge of LFortran
compiler development, do the two things mentioned above - a convenient array constructor for STRING
type and an accessor for substring references similar to CHARACTER
type - look feasible to you technically in LFortran
, or do you think it’s too challenging to implement in LFortran
?
This is important to understand because this is the crux of the matter. The rest of the syntax is not too difficult, as @jeremie.vandenplas pointed out and you noted, the string_type
in Fortran stdlib
is a ready go-by for LFortran
prototype to consider.
Since this is within the discussion of new fortran features, what does everyone think of the following capability?
string, save :: pets(3) = [ string :: "dog", "pony", "turtle" ]
string :: x(3), y(3)
x = [ string :: "do", "po", "tu" ]
y = pets(:)(1:2) ! should be the same as y = x
This kind of array assignment does not work with arrays of allocatable components because the underlying memory is not accessed with regular strides. But with the string type, the underlying memory would never be expected have regular strides anyway, so it seems like this would allow this functionality for this new intrinsic data type, even if it is not allowed for integer, real, logical, or character allocatable components.
That type of functionality could definitely come in handy. What would happen if you had instead used y = pets(:)(1:4)
, noting that pets(1)
is only len
3?