How do i allocate an array of strings?

Because there is absolutely no need to impose that on the practitioner.

Come on, a most basic consideration of language design shall be a consideration for the practitioner.

It is not just the “extra 6 characters”. It is 6*N where N is the number of times the facility will be practiced. In the case of strings in modern codebases toward computing in any domain, even scientific and technical computing, N practically approaches infinity.

To then impose such needless verbosity will be beyond the pale of a disservice on the practitioners .

This argument can be made just as well the other way. If string is an intrinsic derived type, then why shouldn’t a declaration of string look like other derived type declarations in the language, user and intrinsic?

I’m not sure of all the details involved in making string a distinct type by itself, but I can see that that choice could complicate the programmer’s job. Every routine with one or more character arguments would need to somehow be generalized to accept also the new string type argument. Say you start with a subroutine with 4 character arguments, the programmer would then need to write the code and make a generic interface that encompasses all 2**4=16 combinations where zero, one, or more of the arguments are replaced with the new string type. In contrast, if the intrinsic derived type approach is taken, then the argument a can be used for a string argument, or the argument a%s can be used for a character argument, and the compiler can keep track with TKR matching at compile time when a programmer makes a mistake.

This latter situation is how things work now with a user defined string_type. The current problem/limitation is portability and interoperability. That could be solved just with an intrinsic derived type. I personally do not care about typing six extra characters. However, I would like to be able to initialize variables and to define parameters of this type, something that is not allowed now (because fortran does not allow allocatable variables to be initialized or used within a constant).

I have an example where type() is not required in mind: polymorphic/typed allocation. Intrinsic and derived types are treated the same way in this case:

    class(*), allocatable :: x

    allocate(integer :: x)
    allocate(real :: x)
    allocate(character(len=123) :: x)
    allocate(my_type :: x)
! etc...

If someone knows the story behind why type(my_type) was not requested here, it would be informative to know.

1 Like

Well, the answer lies exactly in the part of my message that you have removed from my quote: because

I would strongly hope that no one would seriously think of introducing a string type in such way (I would find it quite illogical and unreasonable). string should be character(:), allocatable with extra features, making it backwards compatible. That’s what simple logic would dictate.

I don’t mind too much as well…as long as I get something in return for the extra effort and verbosity. It doesn’t seem to be the case here. On the contrary, it would suggest the idea that Fortran is an old, patched-up language.

A different question I have is (apologies for my lack of proper understanding in this matter and code error):
As per John Reid’s ‘The New Features of Fortran 2023’ :

If a deferred-length allocatable variable is defined by
intrinsic assignment:
character(:), allocatable :: quotation
:
quotation = “How now, brown cow?”
it is allocated by the processor to the correct length.

Cannot the above be done here as well to make things simple?

character(:),allocatable :: option(:) = [character(:) ::  'strings','with',...]

and maybe

character(:),parameter :: option(*) = [character(:) ::  'strings','with',...]

No, because the items in an array must be of identical size. A level of indirection is fundamentally required.

It kind of depends on your definition of “modern”, and your definition of “first class”. I.e. many people would consider C++ a “modern” language, but it’s not “first class” there, it’s in the standard library, i.e. std::string. And in all “modern” languages I know of it’s pretty clear that the intrinsic string type is just a convenience to not have to explicitly deal with the fact it is a pointer to a char[].

This is probably a valid criticism, and shortening the name of the type to string would be reasonable.

Why should any type be different than integer? Why did we need type() at all? Maybe that part was a mistake? But we are where we are, and if you want consistency, type(integer) is already valid.

The number of characters in a syntax term or identifier is not what you typically want to minimize. Typing speed is not the bottleneck for software development. The goal is to improve clarity. Make the above argument in any other context and you end up advocating for code that looks like it was written during the great vowel shortage of the 1970s. I.e. tmp = grd * expc / cffc. It’s pretty universally accepted that you should have clearer variable names.

That said, if you wanted to make the argument that the extra syntax is just extraneous noise, I’m open to being convinced.

Ok, I’ll define more specifically “first class”: for me it’s “very low barrier to access the feature”. In your C++ example, it’s very easy and “varying strings” are almost immediately available. I wish the same was available in Fortran with a comparable effort.

Because type() marks clearly that’s a user-defined data type: it’s helpful. Not a mistake at all, in my book.

Apart from the fact that I wouldn’t be exactly thrilled to clutter my code with type(integer) declarations, as explained above, the distinction between basic and user-defined, “bespoke” data types it’s useful for code readability and maintenance.

Strictly speaking, the varying_string type is not an intrinsic type of the language but rather a “standardized user-defined” one

Yes, but in modern Fortran (in my opinion) it should be intrinsic because it’s essentially a very basic data type that is extremely common.

2 Likes

How about an attribute called jagged then, which can signal to the compiler that the allocatable array can have element of different size/length. This way backward compatibility will be maintained and the above example should be workable as well,
like :

character(:), allocatable, jagged :: ...

Can something like this work in fortran?

1 Like

That would be true, if we didn’t have intrinsic derived types. The type() is required for declaring variables of of type(event_type), type(lock_type), etc. They are derived types, but not user-defined.

Arrays of intrinsic types are inherently contiguous, and so is their data. Arrays of different length strings inherently cannot have all of their data in contiguous memory. I think it’s important to signify this for performance reasons.

I’d have to think a bit harder about it, but it’s certainly an interesting idea.

1 Like

I suppose those have been introduced F2008+. I never used any of them and I believe that it’s been a mistake to introduce such inconsistency. I hope this mistake won’t be repeated and magnified for something very common like “varying strings”.

Don’t we have the contiguous attribute for enforcing contiguous memory, whereas by default it’s the compiler (the “processor”) that can decide freely what to do? :thinking:

Regardless, I wouldn’t follow such low-level-of-abstraction, C-like principle for the syntax. One of the bright ideas and guiding principles of Fortran is that the underlying implementation is not dictated by the syntax, giving compiler writers enough freedom to optimise as they see fit. I wouldn’t break such long and successful tradition.

Although the standard does not explicitely define the implementations, a Fortran array is internally described by the address of the first element, the size in each dimension, the strides between elements each dimension,… Meaning that the internal representation has a regular shape. And I think that this is actually implicitely enforced by the storage association rules.

Accessing the i-th element of a 1D array just requires a simple address calculation: base_address + (i-1)*stride*type_storage_size. Elements of unequal sizes would completely break that, and this would require a kind of linked list structure, which is a completely different animal. Extensions to 2D, 3D, etc, would not be really simple, not to mention the difficulty to replace one element by a larger element (would require reallocations). I can’t see how this could be backward compatible with classical arrays, this would have to be a fully new data structure in the language.

2 Likes

If you are talking about C/C++, then I would say that not even fixed-length strings are really first-class data types. F77 does have fixed character strings as first class data types. A statement like

a(3:6) = b(2:3) // c(4:5)

demonstrates assignment, substring addressing, and concatenation as part of the syntax of the language. Of course, other tasks are done with intrinsic functions, len(), len_trim(), index(), scan(), and so on. But if you look at the C/C++ model, characters are really just a synonym for a type of integer, character strings are arrays of those integers, and all operations are done with functions, even simple things like assignment, and nothing is really part of the language syntax.

The current discussion is about variable-length strings, which were not part of f77.

I’m not aware of anyone that considers C a modern language.

Yes, but I think maybe it doesn’t mean what you think it means. It means only that the elements of the array must be contiguous, I.e. that:

It dose not mean that all their data must reside in that contiguous section. I.e.

type :: t
  integer, allocatable :: a(:)
end type

subroutine s(ts)
  type(t), contiguous :: ts(:)
  type(t) :: tmp
  integer :: itmp

  tmp = ts(5) ! the position in memory of this element is well known relative to the start of the array
  itmp = ts(5)%a(4) ! this definitely does not reside in the contiguous section of memory of the `ts` array
end subroutine

For procedure arguments that are assumed shape without the contiguous attribute, it can’t be assumed that the actual argument is in fact contiguous. It could be a strided slice, or even a vector subscripted array (i.e. arr([1, 7, 5, 2])).

It was more about C++, but anyway what is the definition of a “modern langage”? To me, C is a perfectly modern language in its application domain.

I think there’s a confusion here. For a 1D array, contiguous means that the stride is 1. For a stride larger than 1 the array is no longer contiguous, but still it has a regular shape.

That’s really the point, and it can hardly be something else IMO. Just, it can be abstracted by the languague up to a point.

1 Like

For simple cases it could probably, but it would be somehow misleading.

The key point is that in Fortran the character concept is quite different from the C (and related language) concept: for instance character(5) is definitely not an array of character(1) but it can be viewed a type in itself. character(6) is another type, different from character(5).

character is essentially a parametrized type (the length being the parameter). character(:) is the defered-length version.

In an array, you can’t mix different types: you can’t mix integer and real elements, and similarly you can’t mix character(5) and character(6) elements. And no, you can’t do that either in Python for instance :slight_smile: : AFAIK Python has no arrays, it has lists, which is different, and you can mix different types in a list. In Python, true arrays are provided by Numpy, and you cannot mix types there.

When building an array of character(:) the compiler has consequently to decide the length of the elements for the whole array. You may say that in:

character(:), allocatable :: a(:)
a = [character(:):: "this", "is ", "a", "bear"] 

the compiler could easily browse the individual elements, determine that the longest one has 4 characters, and decide to build a character(4) array. But there are two caveats:

  • this syntax is supposed to work not only for constant elements, but also for expressions, including function results, etc… So, at runtime the program would have first evaluate all the elements to determine their lengths, keep them somewhere in memory, determine the maximum length, build the array using this maximum length, and copy all the evaluated elements into the array.
  • It would give the illusion that the element lengths can be changed afterwards, i.e. that a(4) = "elephant" would produce the desired result. Actually, the compiler would rather cast “elephant” into a character(4) element and after that a(4) would be equal to "elep"
2 Likes

No, this is not possible with CHARACTER intrinsic type given the semantics introduced in the language starting FORTRAN 77.

For the sake of Fortranners, in an ideal world that prioritizes first and foremost the needs of practitioners, the language standard will introduce a new intrinsic type, say with short, simple name as string, that has a good set of facilities. Fortranners can really, really benefit from this.

But the way things stand, the standard committee will not vote in favor of such a new type, not over the course of next 20 years or so :sob: Why? Because the needs of practitioners are not adequate priority. In a given standard committee meeting that only gets about 10-16 hours of crucial plenary discussion, more time can get spent by current voting members discussing the locale of future meetings than striving for ways to bring better experience with the practice of Fortran by programmers in even the scientific and technical computing domains :sob:

The poor, persevering Fortranners have absolutely no options of influencing the committee otherwise in any reasonable timeframe and that’s the real shame in the Fortran ecosystem.