Array of strings

Hi guys,

In a code that I’m working on, there is an array of strings that can have various length. The array can be very long and it should be user friendly. It is loaded by another library and there is no way that I would go into changing this array in my code.

If I got it right, from the 2003 standard, it is allowed to have arrays constructed in the following way (while in the earlier standards each of the strings should be padded to the full length, 6 in this case):

program example
integer :: nwords = 3
character(len=6), dimension(3) :: words
words = (/'cat', 'donkey', 'orca'/)
print *, words(2)
end program example

This example compiles and runs correctly with ifort (18.0.6), but with gfortran (8.3.0) it gives me

 words = (/'cat', 'donkey', 'orca'/)
                1
Error: Different CHARACTER lengths (3/6) in array constructor at (1)

The error is exactly the same if I try to compile with gfortran and -std=f2003 option.

However, the code compiles with gfortran when the individual strings are padded:

words = (/'cat   ', 'donkey', 'orca  '/)

Am I getting it right that this is a F95/F03 issue? How would you proceed? Is there a way to force gfortran to compile this? Is it standard at all?

3 Likes

I think gfortran is correct. The array of characters should have the same length for each element of array.

1 Like

This works ok with gfortran as well

words  = [character(len=6) :: 'cat', 'donkey', 'orca']
print *, words(2)

And it works even if the lengths in […] and in the definition

character(len=12), dimension(3) :: words

are different. So, what’s the meaning of “character(len=6)” in:

 [character(len=6) :: 'cat', 'donkey', 'orca']

? The syntax accepted by ifort should be included in the standard.

The character (len=n) in the constructor says that an array of strings of length n will be created. Thus, with gfortran the output of

print*, [character(len=7) :: 'cat', 'donkey', 'orca']
print*, [character(len=6) :: 'cat', 'donkey', 'orca']
print*, [character(len=5) :: 'cat', 'donkey', 'orca']
end

is

 cat    donkey orca   
 cat   donkeyorca  
 cat  donkeorca

If the n in character(len=n) is too small, some strings will be truncated.

I think gfortran is correct in this case. It is ambiguous what the length of the characters in the array should be without either the explicit type declaration in the constructor (@Beliavsky’s example), or each being the same length. Intel is doing the nice thing here and assuming the length of the longest one, but the standard doesn’t say that is what should happen, and in fact there are cases where you wouldn’t necessarily be able to tell at compile time what the length of each would be. I.e.

num_strings = [to_string(x), to_sting(y), to_string(z)]

where

function to_string(a) result(string)
  integer, intent(in) :: a
  character(len=:), allocatable :: string
  ...
end function

Yes, that’s the behaviour. So, indeed, character (len=n) in the constructor is effectively doing padding or truncation of the specified elements.

I find such ambiguous things are super frustrating. So easy to miss and a bug that may occur in the code may be awfully hard to find.

Thank you all for your responses!

Compiling my code with gfortran -Wall does give

c:\fortran\test>gfortran -Wall xchar.f90
xchar.f90:3:35:

 print*, [character(len=5) :: 'cat', 'donkey', 'orca']
                                   1
Warning: CHARACTER expression at (1) is being truncated (6/5) [-Wcharacter-truncation]

Many Fortranners wish that the elements of an array of character variables did not have to have the same LEN. The base language is unlikely to change, but there are efforts to overcome this limitation. StringiFor of Stefano Szaghi et al., has the following functionality:

  • low memory consumption: only one deferred length allocatable character member is stored, allowing for efficient memory allocation in array of strings, the elements of which can have different lengths;
1 Like

This project toward strings module in Fortran stdlib will be of help with such needs in coding.

Interested readers can follow the blogposts by @Aman and reach out to @Aman and mentors for feedback, comments, etc.

Ultimately I do hope Fortran standard will include an intrinsic “string” type - now imagine for a moment it’s called string_t - that will allow along the lines of the original post:

   type(string_t) :: words(3)
   ..
   words = [ 'cat', 'donkey', 'orca' ]

As the many community efforts including with latest stdlib show, it’s nearly doable as a user derived type now.

But having it part of the standard as an intrinsic type for “strings” can bring immeasurable benefits in terms of ease-of-use, productivity gains, and consistency of code for the poor, persevering practitioners of Fortran, for working with strings is such a basic aspect of any scientific and technical computing; it’s not only about compute-performance in such computing; preprocessing and post-processing of data toward computations where such facilities come into play are also critical to the overall workflow. Wish standard bearers could come to a convergence on this and commission some work, alas that is not the case!

2 Likes

Thanks! I agree with everything you wrote.

The project that you’ve cited sounds super interesting.

1 Like

I created a “conforming” implementation of the proposed iso_varying_string module (see ISO/IEC 1539-2: 2000). The explanation I was given when I asked why it didn’t make it into the standard was that it was seen as unnecessary given the deferred-length, allocatable character feature that made it in. These examples demonstrate that it clearly isn’t unnecessary, and I’d be in favor of proposing that document again. It is a well designed API, and works very nearly like your example.

words = [var_str('cat'), var_str('donkey'), var_str('orca')]

If it were to be given language feature status it could (I think) be implemented such that the var_str wouldn’t be necessary, and certainly could be more performant.

I agree that Fortran needs an intrinsic “string” type as it seems impossible to enhance the current character type to do what many programmers need. What you can do at present is outlined in a document I put on the fortranwiki earlier this year, and which the original poster might find helpful. (Also please let me know of any mistakes I made in writing this).

http://fortranwiki.org/fortran/files/character_handling_in_Fortran.html

1 Like

@ClivePage, that’s an excellent Wiki document, great effort - I took a quick look and couldn’t think of anything that could make it better!

Quick question: In the last sentence of “2. Character constants” you say “Constants in source-code can only contain characters which are in the Fortran character set. This is specified in section 6.1 of the Fortran 2018 Standard,…”.

My interpretation of 6.1.6 is that, in practice, this is not required: “6.1.6 Other characters
Additional characters may be representable in the processor, but shall appear only in comments (6.3.2.3, 6.3.3.2), character constants (7.4.4), input/output records (12.2.2), and character string edit descriptors (13.3.2).”

So, in the strict sense: yes, a standards compliant program should only contain characters in the Fortran character set. In practice, I believe more than just ASCII characters between codes 32 and 126 are acceptable.

(I may admit that I’m drawing too fine a distinction here.)

I agree, what you say may be arguable. But it’s a bit of a grey area.

Agreed. The whole notion of “processor-dependent” behavior in the standard allows fo many areas, with many shades of grey.

:slightly_smiling_face:

Thank you for the replies and for the discussion. From a user-perspective, I feel that the treatment of strings in Fortran should definitely be improved, either through the standard or through the standard library or in some other way.

To add to my initial post: curiously gfortran perfectly accepts the following version.

program examplegf
integer :: nwords = 3
character(len=6), dimension(3) :: words
data words/'cat', 'donkey', 'orca'/
print *, words(2)
end program examplegf