New Release of iso_varying_string

No, I wouldn’t recommend anyone to try that.

My point is precisely that the analog to the instrinsic CHARACTER type is a type with a kind and a length-type parameter, but seeking such a type in a user library causes problems and use cases are not met such as jagged arrays. And trying to kluge one’s way around including with a kind() / len() method will serve few.

The best way around all this is an intrinsic string type, such a type is too damn important to do any efficient scientific or technical computing, especially now with so much heterogeneous streams of data coming to scientists and engineers toward which string processing is crucial.

Even the notion of an instrinsic module, another ISO one or as part of ISO_FORTRAN_ENV or another part of the standard, will shortchange the practitioners of Fortran. Note with such approaches, a conforming Fortran processor is not required to implement any of the functionality. These are all effectively in the optional category.

It looks liike character(:), allocatable does just about everything that is expected of a string type EXCEPT to allow jagged array declarations. I remember when I first saw that feature, it surprised me that they did it that way instead of the jagged way. That was perhaps the only feature of allocatable characters that they got wrong in the specification (other than the inability to initialize them on definition, which is part of a larger problem). Is it possible to change that one feature in some kind of a backward compatible way (say adding another keyword to the declaration)? Or even if it isn’t backwards compatible, is it possible to just change that one feature so that it works the right way?

My workaround for this feature (which is the same as everyone else’s workaround) is to define a derived type

type jagged_char
   character(:), allocatable :: s
end type jagged_char

and then define arrays of this type. That requires %s to reference the member, but it isn’t too bad otherwise, at least for my purposes. Maybe as an alternative, that type could be standardized so that everyone who uses it would have the same type name and the same component name, arrays could be copied with intrinsic assignment, and so on.

This is the other aspect I miss by not having it be a part of the language. I’ve actually proposed allowing a way for user defined functions to appear in constant expressions, which would allow declaring constants of derived types with allocatable or private components.

1 Like

Is there a proposal for this? What are the technical advantages over a derived type?

As far as I know, there have been no recent, separate, and a formal proposal for an intrinsic string type.

The technical advantages with an intrinsic string type include:

  1. A conforming processor shall provide support toward such a type, for it will not be “optional”,

  2. No 'USE` statement and its intricacies to worry about for practitioners, they get to readily consume such a type,

  3. Standard syntax and semantics for type bound components and procedures, the value and benefits with these are immeasurable,

  4. Semantics that are otherwise extremely, extremely difficult to contemplate can be more readily introduced with an intrinsic type, bringing further benefits to the actual practitioners of Fortran. Think MIN, MAX etc. intrinsic functions that are generic with variable number of arguments available since early editions of FORTRAN - only compliers can do “magic” (non-standard stuff under the hood) to make the practitioners avail of the benefits, same principle applies here. And,

  5. Implementations can more easily provide compiler optimizations with such an intrinsic feature.

The cost for getting this into the standard and for implementations to provide is effectively a one-time capital expense. And it’s truly quite low a cost given how close even “quick and dirty” the user library attempts get to the needed use cases. Should the committee and vendors instead guesstimate a high cost for this and struggle with getting right such a type, they need to realize they have problems at multiple fundamental levels and should seriously consider exiting the “Fortran” business altogether.

But for practitioners it will be much greater and continuous lifetime benefits with the use of such an intrinsic type, if it were to become available.

Just a few more comments on an intrinsic string type. I also think it must be part of the language. Wrapping a character(len=:),allocatable in a type is not enough, which has been clear for about 20 years now.

  • Can’t use the (A) format descriptor in read or write statements.
  • Can’t define parameters
  • Can’t use the slice notation (string(1:10) = ...)

Any usable string implementation must have these features. No user-created string library can currently do these, so they are all suboptimal.

An alternative would be to modify the language so that these are all possible with a user type. Then it would technically be possible to create a string library (say in stdlib) that could potentially become a de facto standard. But even that seems suboptimal and more complicated than just adding a string type to the language.

This code will compile and run:

type jagged_char
character(:), allocatable :: s
end type jagged_char
type(jagged_char) :: string
string%s = 'Hello World'
write(*,'(a)') string%s
write(*,'(a)') string%s(1:5)
end

That code does everything except for the parameter, which is prevented because of the need for allocatable initialization. As I said previously, that is a general shortcoming in fortran, not specific to just allocatable characters. Currently it is not possible to initialize any allocatable entity, character, real, integer, logical, complex, or derived type. I think all of those possibilities should be allowed in the language. The programmer does need to write the %s, but that is trivial. The real critical limitation, the missing feature in the language, is the ability to initialize an allocatable entity.

The %s is not trivial. Having to expose that to the user is a profound indication that this is a hack to work around problems with the language.

1 Like

We can agree to disagee on that. I think there is an important nontrivial aspect of this problem, that reaches beyond just allocatable characters, and then there is %s.

I would put this in the same category as z%re and z%im for complex numbers. To me, that is an almost ideal solution to the problem that fortran had with complex numbers dating back to the beginning of the language.

1 Like

To this point, I would add that another shortcoming of strings not being an intrinsic type is the option to use formatted I/O to/from a multi-line string in the same way that it can be done from an external file.

It occurs to me increasingly often that structured data can either be found on an external file or from another means that can be stored in a character variable, but it cannot be read from either source with the same subroutine, am I the only one having this issue?

No :cry:

1 Like

RonShepard said, correctly, that one can’t have parameters of his jagged_char type. Sometimes I use a function with no arguments to avoid other Fortran restrictions on parameters. On trying it with the program below, using gfortran and ifort, I found that one but not all three of the ways one might use a parameter will work. The other two are hiding behind !!.

program stringtest
 implicit none
 type jagged_char
    character(:), allocatable :: s
 end type jagged_char
 type(jagged_char) :: string !! = rope()
 string   = rope()
 write(*,'(a)') string%s
 write(*,'(a)') string%s(1:5)
!!  write(*,'(a)') rope()%s
!!  write(*,'(a)') rope()%s(1:5)
contains
 type(jagged_char) function rope()
   rope%s = 'Hello world'
 end function rope
end program stringtest
1 Like

Sorry to necro-bump this thread, but I unfortunately have bad news about this release. It seems the DTIO causes unforeseen errors to occur in unexpected places at run-time with some compilers, and (luckily) compile time errors with other compilers. I therefore do not recommend making use of this version in other libraries, and future development (if any) will be continuing from the 3* version. If anyone would have time to isolate the issues and submit bug reports I’d appreciate the help. I will try to remember to periodically check the state of the compilers with regards to this feature and hopefully be able to reintroduce it later.

4 Likes

First of all, thank you @everythingfunctional for your efforts concerning varying strings. I realize this thread is some months old, but I see no reason not to “revive” it, so please let me add my two humble bits.

This is a topic I had in mind for years, but always postponed implementing a module for making varying-length strings less of a mess. And since allocatable strings do work as an elegant solution is normal cases, the real problem is arrays of varying-length strings, which is not a top priority (I don’t have to deal with them too often.) However in a recent project of mine I really needed such arrays, and I found this project to be a reliable solution. There are some limitations, but I didn’t expect a “magical” module that will let me use varying-length strings exactly as normal strings.

The first limitation is assignment. As others pointed out already, this won’t work (and in fact I didn’t expect it to work):

type(varying_string), dimension(3) :: str_vector
str_vector=["foo", "foofoo", "yet another foo"]

This works, of course:

type(varying_string), dimension(3) :: str
str_vector(1)="foo"; str_vector(2)="foofoo"; str_vector(3)="yet another foo"

It’s not very elegant, but I can think of ways that will automate this (maybe worth prototyping as an extension?)

Now, I’m not happy with the ISO (as I’m sure many others are.) I think it’s actually half-baked, and it could be much, much better. But I guess it is what it is, so let’s deal with it using extensions.

@everythingfunctional added extensions so that varying-length strings can be used in print, read or write statements. This is a very welcomed addition. In version 4.0.0 print *,str_vector actually works, but since that version is not recommended, I switched to 3.0.4 where print *,str_vector doesn’t work. Using char to convert a varying-length string to a “normal” one works, and it actually allows (a) formats but for the vector str_vector I used as an example above this won’t work:

print "(3a)",char(str_vector)

apparently because char is not elemental. In version 4.0.0 gfortran accepts this but it prints nothing; in version 3.0.4 the compiler issues an error, which is better. So, to print a vector of varying_strings one needs to do something like this

print "('Strings: ',2(a,', '),a)", char(str_vector(1)), char(str_vector(2)), char(str_vector(3))

which is not very elegant (and it actually reminds me the way you print arrays of “strings” in C.) Please correct me if there is another way to do this.
Again, I can think of ways that will automate this for any vector size, and maybe it is worth prototyping.

All the above are for vectors only. Arrays of such kind are obviously harder to prototype but vectors are probably the case most people will need (and there is always reshape for arrays.)

This could be made to “work”, by making the assignment elemental, but it’s worth noting that as written it is not standards conforming anyways, because it should either be

str_vector = ["foo            ", "foofoo         ", "yet another foo"]

or

str_vector = [character(len=15):: "foo", "foofoo", "yet another foo"]

So it doesn’t really do what you want anyway due to language constraints on arrays of characters, irrespective of the varying_string type definition.

I think you could do it like this:

print *, (char(str_vector(i)), i = 1, size(str_vector))

And I believe there is a way to write the format string like you try above to be more general, with something like the : format descriptor, but I don’t quite remember how to do it off the top of my head.

Indeed, it doesn’t really do what I wanted (and I guess others would want as well)… All the above just define a vector of fixed-size (len=15 in this case) varying strings, which begs the question why do they need to be of type(varying_string) in the first place. Unless extra functionality provided by the module is needed (like insert or remove,) they could be just normal strings. In addition, printing them will need to trim them first which, again, could be done with normal strings.
However, trim is an elemental function, so this works just fine:

type(varying_string), dimension(3) :: str_vector
str_vector = ["foo            ", "foofoo         ", "yet another foo"]
str_vector=trim(str_vector)

It’s not very elegant, it doesn’t bypass the constraints on arrays of characters, but it removes the trailing blanks permanently, and that’s something you can’t do with normal strings. I can’t think of a better way to assign values to vectors of varying strings.

Yes, you can, and that’s what I actually do in my programs. It’s just a an implied do, but it’s compact enough. It also needs to declare i which might not otherwise needed but, for lack of a better way, it’s good enough I guess. If only char was elemental…

Really, you’d just do it this way:

str_vector(1) = "foo"
str_vector(2) = "foofoo"
str_vector(3) = "yet another foo"

I agree it’s not as elegant, but it is what it is.

char can’t be elemental, because it would result in an array of characters with different lengths. That said, what I usually do is

use iso_varying_string
use strff, only: join
call put_line(join(str_vector, ", "))