How to create a character array whose each element is just the right length based on its content?

Dear all,

A quick question, I want an array, each element is character type.
But I want the len of each element is just the right size based on its content, instead of a fixed length for all elements. Is there an easy way to achieve this?

See the code below,

    program main
    character(len=:), allocatable :: string(:)
    character(len=:), allocatable :: string_scalar
    allocate(character(len=20) :: string_scalar)
    allocate(string(5),mold=repeat('0', 20))
    string(1) = '123456789'
    write(6,*) 'string(1) and its size are: ', string(1), len(string(1))
    string_scalar='123456789'
    write(6,*) 'string_scalar and its size are: ', string_scalar, len(string_scalar)
    end program main

In this code, string_scalar has the correct size based on its content ‘123456789’, which is 9.
However, string(1) has the fixed len=20 which is set when doing allocate.
Is there a way to let string(1) also have len=9 just like string_scalar?

Thank you very much in advance!

PS,

The inspiration and technique of string_scalar is from @everythingfunctional ,

1 Like

You have to use a separate type, as the length of each character in an array must be the same. Luckily, my implementation of iso_varying_string does just that.

2 Likes

Alternatively, you could use an (possibly allocatable) array of derived type containing an allocatable character component:

program strings
  type pnt
    character(:), allocatable :: str
  end type pnt
  
  type(pnt), allocatable :: strarr(:)
  
  allocate(strarr(10))
  strarr(1)%str = 'abc'
  strarr(2)%str = '0123456789'
  print *, len(strarr(1)%str), len(strarr(2)%str)
end program strings
2 Likes

Can this be done using stdlib? Let’s start recommending these basic operations using stdlib, and if it is not possible, let’s implement it. I think that will ensure that a regular Fortran compiler + stdlib provides fixes for everything.

3 Likes

Yes, string_type in stdlib_string_type can do this. Or you can use Brad’s library, or some other, the solutions are all in the same spirit.

1 Like

Thank you very much! @everythingfunctional @msz59 @certik @milancurcic
I am glad stdlib has this already.

Just a quick follow-up question about the @msz59 solution,
Since

  type pnt
    character(:), allocatable :: str
  end type pnt

only have one stuff, str.
Is it possible to directly define type pnt, so that pnt already means character(:), allocatable ,
so that I can simply use

strarr(1) = 'abc'
strarr(2) = '0123456789'

instead of adding the %str ?
Like do I need to define the ‘=’ symbol such that

strarr(1) = 'abc'

actually means

strarr(1)%str = 'abc'

Or, is the %str stuff inevitable?

1 Like

Yes, both the stdlib string_type and Brad’s iso_varying_string can do this. They provide a custom assignment operator so that you can assign an intrinsic character variable to a string instance. See how they’re defined in stdlib and iso_varying_string.

2 Likes

I was thinking that ASSOCIATE could be used, albeit in a somewhat cumbersome manner as an alternate way to get a shorter syntax and found I was wrong. I admit to (nearly) never using ASSOCIATE except to give short names to values when writing very long formulas; but I was
surprised the following was not legal if a little inelegant. Wondering if anyone had a short explanation as to why this (having an allocatable on the RHS of ASSOCIATE) is not allowed, as it is not apparent to me why it would be:

!!
!! W-A-R-N-I-N-G: example of invalid code
!!
type pnt
   character(len=:),allocatable :: str
   !character(len=20) :: str
end type pnt
type(pnt),allocatable :: string_pnt(:)
character(len=:), allocatable :: string_scalar
character(len=*),parameter :: g='(*(g0,1x))'

   allocate(string_pnt(10))
   ASSOCIATE ( string => string_pnt(:)%str )
   
       string(5) = repeat('0',20)
       string(1) = '123456789'
   
       write(*,g) 'string(1) and its size are: ', string(1), len(string(1))
   
       string_scalar = '123456789'
   
       write(*,g) 'string_scalar and its size are: ', string_scalar, len(string_scalar)
   END ASSOCIATE

end program main

Note that if I used a fixed-length CHARACTER variable(commented out in the TYPE declaration above) the code compiles and runs just fine, but no longer does what the OP wants.

2 Likes

It’s rather difficult to answer “why”, chances are high no definitive source toward the rationale included in the construct association section of the standard can be obtained.

The reason is very likely to keep it “safe” (and simpler) when it comes to “pointers” which Fortran has tried since Fortran 90 when it introduced the POINTERTARGET as a paired attribute to bring out more of an aliasing facility rather than “raw pointers” into the language. The language does not permit a TARGET to be allocated via the POINTER.

It’s kinda the same with selector and the associate name here.

How much longer will it take to realize the use of varying length string types is inherent to scientific and technical computing!!? And for Fortran to provide such a string facility as an intrinsic type!

Look at the calendar: no one now should have to invoke an USE xx statement preceded by all the steps to package manage xx just to consume a string type. Fortranners should simply be able to do

string_t, allocatable :: strings(:)
strings = [ "Hello", "World!" ]
1 Like

Is the idea to take string_t from stdlib (they call it string_type — any idea why the longer name? The shorter one seems easier to type and use) and simply include in a compiler itself?

Yes, that is part of our plan to eventually port some things from stdlib into the language itself, after it gets good use. string_t can be one such candidate. Alternatively we can see if there is a way to extend the native “character” type to allow this usage.

@FortranFan May I ask, what is string_t ?

@CRquantum , my post included a hypothetical code snippet showing a potential new intrinsic type named string_t introduced into the language in a future revision. Such a type is not available currently.

1 Like

I couldn’t agree more. Repeating mantras like "trailing spaces in a text assigned to character variable are not significant any more, indistinguishable from those padded by the processor" to the students in the third decade of 21st century is really embarrassing.

But even if not intrinsic, does anybody know why the iso_varying_string, defined as auxiliary part of F95 standard, have not make it to the primary three versions later? Or, at least, to any of commonly used compilers (surely not to GNU gfortran or Intel ifort)?

1 Like

Yes, I see now it seems ASSOCIATE is treated like COMMON or EQUIVALENCE in that it has to point to fixed defined locations, and I (naively, apparently) thought it’s main purpose was to provide a general aliasing capability. Apparently why it is called ASSOCIATE instead of ALIAS! It would be far more useful for my particular wants if it was a general aliasing command. A preprocessing macro (which I prefer to avoid) would be closer to what I wanted than ASSOCIATE ends up being. Thanks!

1 Like

There was a discussion here. I just reviewed it again and it seems that the preference of the community there is fragmented between <type-name>, type-name>_t, and <type-name>_type, all three being acceptable. I believe only the stdlib_bitsets module uses the first form, and all others use the third form. So, all three being acceptable it’s down to the implementor, reviewers, and maintainers to decide.

If a stdlib derived type is to be adopted into the standard in the future, it would be renamed anyway to <type-name>_type for consistency with existing intrinsic derived types. So, my position as a maintainer is to prefer this form, for readability to newcomers and consistency with the intrinsic types. We can change this, of course, and adopt a different convention, but I haven’t seen a good reason to do that yet.

2 Likes