The proposed feature could be defined to work any way we want it to. It seems that the most useful way to define it would be to truncate it to the actual length. I would also say that whatever is done in the array case should be consistent with the scalar case.
Yes, that’s exactly the idea. Fortran really needs inspired compiler implementors to make things happen the way the Community can define it to work any way it wants to, as opposed to constant naysayers.
I think to answer your question for sure is as much work as just implementing it, which we should do, it seems there is a lot of interest in this feature. If we allowed lists instead of arrays, then the answer is that it’s not complicated, because we already have it (except syntax, but that’s not difficult).
For arrays I am not sure if there is a complication with the allocatable array elements, since that’s how we currently represent a string internally.
I often find inspiration in Python and NumPy. It looks like the default strings in NumPy are the same as in Fortran, that is, they are fixed length:
import numpy as np
# Create a 2D array of strings
string_array = np.array([['Hello', 'World', '!'],
['I', 'am', 'Python'],
['Welcome', 'to', 'NumPy']])
# Print the string array
print(string_array)
print(string_array[0, 2])
print(string_array[1, 2])
print(repr(string_array))
string_array[0,2] = "abcdefghijkl"
print(repr(string_array))
Personally I would just like lists in Fortran in general. It would be cool if we could have lists of any type, maybe declared ‘list(real) :: x’ to indicate x is a list of scalar real values. I imagine the normal list type functions could be replicated with a derived type having allocatable components of each intrinsic type, with type bound procedures and probably having to overload all the operators as well to get it working. That would be a big pain, and since it wouldn’t come from the compiler, each developer setting up a system like that would more than likely do it a bit differently.
Also I’ve mentioned elsewhere but will say again - I believe the string type should replace all use cases for character variables with len > 1. Having 1, ONE, string type or way to represent a string is a lot easier to teach and keep track of than something like Rust with 6-8 different strings… Str, String, Cstr, …
It’s not to say that we should not have this particular feature in Fortran, but I want to warn about the “me too” criterion. It’s not because Numpy or whatever langage/library has some feature that Fortran HAS TO have the same feature.
@certik, I believe stdlibstring_type and stringlist_type were both influenced by Python strings.
As far as I understand, most of the code/options proposed in this thread for strings could be implemented with stdlibstring_type and stringlist_type. Therefore, I am not sure to understand your last comment. Also, trying to copy all features of other languages might not be needed, as most of the community might not be interested in all these features. So, would it be possible to implement a intrinsic string in LFortran that would actually rely on string_type on stringlist_type, such that the following code:
program example
implicit none
stringlist_type :: first_stringlist, second_stringlist
string, allocatable :: stringarray(:)
first_stringlist = first_stringlist//"Element No. one"
stringarray = [string :: "Element No. three", "Element No. four"]
second_stringlist = first_stringlist//stringarray
end program
would be translated inside LFortran to :
program example
use stdlib_stringlist_type, only: stringlist_type, operator(//)
use stdlib_string_type, only: string_type
implicit none
type(stringlist_type) :: first_stringlist, second_stringlist
type(string_type), allocatable :: stringarray(:)
first_stringlist = first_stringlist//"Element No. one"
stringarray = [string_type("Element No. three"), string_type("Element No. four")]
second_stringlist = first_stringlist//stringarray
end program
I believe most of the concerns were considered (solved?) by stdlib. More specific concerns like stringarray(:)(1:3) could not be solved, of course, but could be at the compiler level (I guess). And, of course, other concerns will appear during the development.
PS: I know nothing in compiler development. So this post might be wrong.
In Python a list can have elements of different types. When the elements of the list have a specified type, as in your list(real) :: x, how is that supposed to differ from real, allocatable :: x(:)? Is it supposed to behave like vector<float> of C++ with a fast append operation?
I do believe that a list containing only a single type should be reproducible using allocatable, as I indicated originally. The main difference would be that if the compiler implemented the concept of lists, their behavior and associated methods could be uniform and transferable across programs all written by different developers.
It would also be cool to have a generic list that could hold truly anything. I believe even that could still be implemented in current, Modern Fortran, but would definitely be unique to each programmer writing such functionality. I suspect performance would also be pretty bad, because I don’t know how you would accept items of any type without using some class(*) and a ton of select type. It would probably end up pretty gross…
My limited understanding of how to use class(*):
module mymod
implicit none
private
public :: do_work
contains
subroutine do_work(arg)
class(*), intent(in) :: arg
select type (arg)
type is (real)
write(*,*) 'arg: ',arg
type is (integer)
write(*,*) 'arg: ',arg
type is (character(len=*))
write(*,*) 'arg: ',arg
end select
end subroutine do_work
end module mymod
program main
use mymod, only: do_work
implicit none
call do_work(5)
call do_work(5.0)
call do_work('I am entering 5.')
end program main
Please note again and again with “container” types - whether it is a basic STRING, BITS - or generic lists, dictionaries, etc., it comes down to addressing two current deficiencies with the language standard for Fortran:
Constructors
Accessors
for a practitioner of Fortran to consume the facility productively and efficiently.
From a language point-of-view, that is the larger aspect for compiler implementors, say LFortran, to develop reusable solutions that can make the life of Fortranners far more effective and fun.
Thus my question to you above re: the STRING and a constructor for it, as shown with the pets example, along with a substring reference is intended as a small step that might eventually lead to a big leap for Fortran.
I agree. I meant it mostly the other way: if NumPy didn’t even allow it (as I initially thought), I was going to propose that maybe Fortran also doesn’t need it.
When I think of the functionality of a list, I think of features like the ability to append or insert new elements or to delete existing elements. Those are operations that are more difficult or inefficient to do with an array. As to how the list is implemented, there are many options, such as linked lists, binary search trees, hash tables, and so on.
@certik, honestly and frankly, I am taken aback and highly disappointed with the line of thinking behind, “if NumPy didn’t even allow it …, I was going to propose that maybe Fortran also doesn’t need it.”
There is close similarities behind your mindset here and what the Fortran committee(s) do with positions such as in this paper:
One might as well as defer all the evolution of the language to the “wise” minds on the J3 and WG5 committees and sit back and wait decades and decades to work on DO CONCURRENT and coarrays and what-not which prove practically so difficult to get any benefits from. And in the meantime, find things being ignored such as those that are a basic facility in the languages used highly successfully and promptly in scientific and technical and numerical computing, starting with Python and C++. Which is easy and convenient facilities starting with strings. You don’t need NumPy or any such thing in Python to get an array of jagged strings, it is effectively built-in, you’d know this better than anyone else: see
So why then try to relitigate the use cases of Fortran practitioners and then even contemplate add on packages such as NumPy to try to make the case “maybe Fortran does not need it”? That notion should be unacceptable, particularly considering everything which has been communicated repeatedly on this forum and elsewhere by a wide group of practitioners - see the paper by @ClivePage nearly 10 years ago. It’s as if you noticed something in NumPy and instantly you were trying to second-guess and overrule all the needs expressed by others. This is precisely the issue with J3 and WG5 and we are trying to avoid, cue the paper above.
A fundamental tenet for anyone who gets positioned with tools and solutions for society, usually thanks to their service and expertise, shall be to pay heed to the needs of those willing to use the tools and solutions. That is what helps society greatly. “Necessity is the mother of invention”, as goes the cliched proverb.
In the case of Fortran, see the response by @jacobwilliams : the need for an intrinsic STRING type is essentially a bare necessity now. Please understand, you or anyone else should not try to second-guess this now. As I stated above, this is exactly why there is a need to look beyond J3 and WG5 - they have failed repeatedly and they will not deliver ever on these bare necessities for Fortranners. This is why there is a need now for the Community to look elsewhere and it’s where LFortran and you come in. Your line of thinking then is a major worry, it gets into the Et tu category.
@FortranFan the question in my mind is if lists of strings is enough, or if we also need array of strings.
What I would like to avoid is just adding random things into Fortran just because we can. Rather, I would like to have a well thought out set of features that play well together.
When character strings a an b are allocatable, then I think the assignment
a = b
is allowed only when b is allocated. However, if x and y are derived types with allocatable character components, then the assignment
x = y
is allowed even when the components are not allocated. If y%a is not allocated, then after the assignment x%a is also not allocated, and allocate on assignment is done otherwise. I’m not sure why there is a difference, but I think that description is correct.
Which behavior should be adopted for a new string data type?
My preference will be same as for the CHARACTER type. With the string type, the user ideally, needn’t be bothered with its internals - that’ll be for the processor to take care of as it sees fit - and thus aspects like “%a” will hopefully be of no relevance.
Now, given the importance toward interoperability with C in the Fortran standard, I will recommend the string type to eventually offer something like a c_str procedure (also as type-bound) so a coder can do s%c_str() to fetch the C equivalent char array for the string content, but not have to do s%a.
I could see using a type bound procedure to convert the fortran string into a null-terminated C string, but to my eye, if you just want the fortran character, then variable%s (or something equivalent) seems more straightforward and less typing than something like variable%function(). The C string conversion requires memory allocation and a copy, while the variable%s syntax is a direct reference to the value, so there could be performance side effects too.
Please take a look at the simple working example upthread in this comment.
My hope with the new intrinsic string type is the variable reference itself shall yield the Fortran CHARACTER information for the coders so that further indirection as %s will not be necessary e.g., with the code shown above
..
parts(2) = "thermocouple"
..
print *, parts(2) !<-- outputs 'thermocouple'
..
character(len=:), allocatable :: part
..
part = parts(2) !<-- Fortran CHARACTER object `part` gets defined as 'thermocouple`
..
I am assuming that is some kind of defined assignment, or something equivalent. In that case, this still requires a memory allocation step and a new copy of the data, whereas a parts(2)%s reference would be directly to the original data. How do you propose to access the original data in a way that allows a compiler to avoid that extra effort? Maybe a function that returns a character pointer? That still seems to be a roundabout way to do something like parts(2)%s.