In a code that I’m working on, there is an array of strings that can have various length. The array can be very long and it should be user friendly. It is loaded by another library and there is no way that I would go into changing this array in my code.
If I got it right, from the 2003 standard, it is allowed to have arrays constructed in the following way (while in the earlier standards each of the strings should be padded to the full length, 6 in this case):
program example
integer :: nwords = 3
character(len=6), dimension(3) :: words
words = (/'cat', 'donkey', 'orca'/)
print *, words(2)
end program example
This example compiles and runs correctly with ifort (18.0.6), but with gfortran (8.3.0) it gives me
words = (/'cat', 'donkey', 'orca'/)
1
Error: Different CHARACTER lengths (3/6) in array constructor at (1)
The error is exactly the same if I try to compile with gfortran and -std=f2003 option.
However, the code compiles with gfortran when the individual strings are padded:
words = (/'cat ', 'donkey', 'orca '/)
Am I getting it right that this is a F95/F03 issue? How would you proceed? Is there a way to force gfortran to compile this? Is it standard at all?
I think gfortran is correct in this case. It is ambiguous what the length of the characters in the array should be without either the explicit type declaration in the constructor (@Beliavsky’s example), or each being the same length. Intel is doing the nice thing here and assuming the length of the longest one, but the standard doesn’t say that is what should happen, and in fact there are cases where you wouldn’t necessarily be able to tell at compile time what the length of each would be. I.e.
c:\fortran\test>gfortran -Wall xchar.f90
xchar.f90:3:35:
print*, [character(len=5) :: 'cat', 'donkey', 'orca']
1
Warning: CHARACTER expression at (1) is being truncated (6/5) [-Wcharacter-truncation]
Many Fortranners wish that the elements of an array of character variables did not have to have the same LEN. The base language is unlikely to change, but there are efforts to overcome this limitation. StringiFor of Stefano Szaghi et al., has the following functionality:
low memory consumption: only one deferred length allocatable character member is stored, allowing for efficient memory allocation in array of strings, the elements of which can have different lengths;
This project toward strings module in Fortran stdlib will be of help with such needs in coding.
Interested readers can follow the blogposts by @Aman and reach out to @Aman and mentors for feedback, comments, etc.
Ultimately I do hope Fortran standard will include an intrinsic “string” type - now imagine for a moment it’s called string_t - that will allow along the lines of the original post:
As the many community efforts including with latest stdlib show, it’s nearly doable as a user derived type now.
But having it part of the standard as an intrinsic type for “strings” can bring immeasurable benefits in terms of ease-of-use, productivity gains, and consistency of code for the poor, persevering practitioners of Fortran, for working with strings is such a basic aspect of any scientific and technical computing; it’s not only about compute-performance in such computing; preprocessing and post-processing of data toward computations where such facilities come into play are also critical to the overall workflow. Wish standard bearers could come to a convergence on this and commission some work, alas that is not the case!
I created a “conforming” implementation of the proposed iso_varying_string module (see ISO/IEC 1539-2: 2000). The explanation I was given when I asked why it didn’t make it into the standard was that it was seen as unnecessary given the deferred-length, allocatable character feature that made it in. These examples demonstrate that it clearly isn’t unnecessary, and I’d be in favor of proposing that document again. It is a well designed API, and works very nearly like your example.
words = [var_str('cat'), var_str('donkey'), var_str('orca')]
If it were to be given language feature status it could (I think) be implemented such that the var_str wouldn’t be necessary, and certainly could be more performant.
I agree that Fortran needs an intrinsic “string” type as it seems impossible to enhance the current character type to do what many programmers need. What you can do at present is outlined in a document I put on the fortranwiki earlier this year, and which the original poster might find helpful. (Also please let me know of any mistakes I made in writing this).
Quick question: In the last sentence of “2. Character constants” you say “Constants in source-code can only contain characters which are in the Fortran character set. This is specified in section 6.1 of the Fortran 2018 Standard,…”.
My interpretation of 6.1.6 is that, in practice, this is not required: “6.1.6 Other characters Additional characters may be representable in the processor, but shall appear only in comments (6.3.2.3, 6.3.3.2), character constants (7.4.4), input/output records (12.2.2), and character string edit descriptors (13.3.2).”
So, in the strict sense: yes, a standards compliant program should only contain characters in the Fortran character set. In practice, I believe more than just ASCII characters between codes 32 and 126 are acceptable.
(I may admit that I’m drawing too fine a distinction here.)
Thank you for the replies and for the discussion. From a user-perspective, I feel that the treatment of strings in Fortran should definitely be improved, either through the standard or through the standard library or in some other way.
To add to my initial post: curiously gfortran perfectly accepts the following version.
program examplegf
integer :: nwords = 3
character(len=6), dimension(3) :: words
data words/'cat', 'donkey', 'orca'/
print *, words(2)
end program examplegf