Character length in array constructor

Does the standard require all character elements to be of the same length in an array constructor?

For example is this allowed: ["a", "abc"]. It seems that some compilers will take the longest string and pad the other strings to be of the same length, while some other compilers give an error.

Test program:

program t
call p(["a", "abc"])

contains

subroutine p(x)
character(len=*), intent(in) :: x(:)
print *, x
print *, size(x), len(x(0))
end subroutine

end program

This gives with ifx 2021.11.0:

 a  abc
           2           3

With Flang:

 a  abc
           2           3

With GFortran 15.1:

/app/example.f90:9:24:

    9 | print *, size(x), len(x(0))
      |                        1
Warning: Array reference at (1) is out of bounds (0 < 1) in dimension 1
/app/example.f90:2:12:

    2 | call p(["a", "abc"])
      |            1
Error: Different CHARACTER lengths (1/3) in array constructor at (1)
Compiler returned: 1

With LFortran

semantic error: Different `character` lengths 1 and 3 in array constructor
 --> a.f90:2:14
  |
2 | call p(["a", "abc"])
  |              ^^^^^

Related topic: Character array constructor.

The standard requires it. Some functions as well. I find that particularly irritating in the MERGE() procedure. A rather verbose but standard way to allow the strings to be different lengths is

        [ character(len=10) :: 'a', 'bb','jjjjjjjjjj']

I have seen others create functions to handle that. I have one myself. Initially the standard method was so irritating (and prone to truncation) that I made a procedure that could take up to 20 arguments and return an array; but I have gradually gotten so I can use the standard syntax.

https://urbanjost.github.io/general-purpose-fortran/docs/bundle.3m_strings.html

It is a common extension to do as described (extend to longest length.) Not sure if trailing whitespace always counts or not; particularly if variables are used instead of fixed strings.

From the standard

            NOTE6
            An example of an array constructor that species a length type parameter:
                   [ CHARACTER(LEN=7) :: ’Takata’, ’Tanaka’, ’Hayashi’ ]
            In this constructor, without the type specication, it would have been necessary to specify all of the constants
            with the same character length.
1 Like

Yes unless there is a type-spec. :frowning: See F2023 7.8 para 2 which says
If type-spec is omitted, corresponding length type parameters of the
declared type of each ac-value expression shall have the same value; in
this case, the declared type and type parameters of the array constructor
are those of the ac-value expressions.

1 Like

To see if code is standard-conforming, one should compile with the strict standard option. For your code with ifx one gets

c:\fortran\test>ifx -nologo -stand:f18 xtest_char.f90
xtest_char.f90(2): warning #8208: If type specification is omitted, each ac-value expression  in the array constructor of type CHARACTER must have the same length type parameters.   ['a']
call p(["a", "abc"])
--------^
1 Like

Not efficient, but does the job:

module charray
implicit none

    private
    public :: operator(/)

    interface operator (/)
        module procedure charray1, charray2, charray3, charray4
    end interface 

contains

    function charray1(x,y) result(z)
        character(*), intent(in) :: x, y
        character(:), allocatable :: z(:)
        z = charray4([x],[y])
    end function

    function charray2(x,y) result(z)
        character(*), intent(in) :: x(:), y
        character(:), allocatable :: z(:)
        z = charray4(x,[y])
    end function

    function charray3(x,y) result(z)
        character(*), intent(in) :: x, y(:)
        character(:), allocatable :: z(:)
        z = charray4([x],y)
    end function

    function charray4(x,y) result(z)
        character(*), intent(in) :: x(:), y(:)
        character(:), allocatable :: z(:)
        allocate( character(max(len(x),len(y))) :: z(size(x)+size(y)) )
        z( :size(x))   = x
        z(size(x)+1: ) = y
    end function

end

program foo
use charray
implicit none

    character(:), allocatable :: str(:)

    str = "This" / "is" / "an" / "array" / "of" / "characters"
    print *, size(str), len(str)
    print "((A))", str

end
Program returned: 0
Program stdout

           6          10
This      
is        
an        
array     
of        
characters
5 Likes

Nice. With the function here one can write

words("one two three")

to get an array of 3 character variables, but this relies on their not having embedded spaces.

2 Likes

The function could be easily extended with an optional argument that defines an arbitrary separator:

words("one|two|two and a half|three",sep='|')
1 Like

LLVM flang also gives a warning with --pedantic option (Compiler Explorer)

/app/example.f90:2:14: portability: Character literal in array
constructor without explicit type has different length than
earlier elements
  call p(["a", "abc"])
               ^^^^^
Result:
a  abc
 2 3

while nvfortran gives this result (by probably determining the length from the first element) (Compiler Explorer)

aa
            2            1

I feel the extension by ifx and LLVM Flang is the most convenient, but unfortunately gfortran does not support it (yet?).

1 Like

Thanks everybody for the feedback. This has been very helpful.

I think with --std=f23 compilers should give an error, because the standard currently says all lengths must be equal.

And without it, two good options:

  • determine (at runtime in the most general case) the length of the largest string and use that.
  • give an error if strings are not equal length, like GFortran

Given the feedback above, it seems the standard should be changed to allow unequal lengths, picking the largest.

3 Likes

Curious if any compilers allow the return value candidates be different lengths in MERGE(), and if so do they expand everyone to the same length or return the original lengths?

Does the standard restriction allow for significantly more efficient code? Wondering if there is a good reason for the default? That being said, I certainly prefer the extension described above.

program testit
logical answer
integer i
do i=1,10
   call random_number(r)
   answer=merge(.true.,.false.,r>0.5)

#ifdef STANDARD
   write(*,*)trim(merge("yes","no ",answer)) ! standard
#else
   write(*,*)merge("yes","no",answer)       ! non-standard
#endif

enddo
end program testit

Gfortran and ifx do not, but you can write a function that allows the first two arguments to have different LENs and returns a character variable with the larger LEN:

module m_mod
implicit none
private
public :: mymerge
contains
elemental function mymerge(x, y, tf) result(z)
character (len=*), intent(in) :: x, y
logical          , intent(in) :: tf
character (len=max(len(x), len(y))) :: z
if (tf) then
   z = x
else
   z = y
end if
end function mymerge
end module m_mod

program main
use m_mod, only: mymerge
implicit none
character (len=*), parameter :: fmt_c = "(*(1x,a))"
print fmt_c,mymerge("yes", "no", .true.)
print fmt_c,mymerge("yes", "no", [.true., .false.])
print fmt_c,mymerge(["yes ", "sure", "ok  "], "no", .true.)
print fmt_c,mymerge(["yes ", "sure", "ok  "], "no", [.true.,.false.,.true.])
end program main

output:

 yes
 yes no 
 yes  sure ok  
 yes  no   ok 
1 Like

:100: it’s also a very important quality of life improvement for setting parameter character arrays

1 Like

The merge intrinsic was originally supposed to be a portable replacement for the Cray CVMGx instrinsics. These allowed conditional expressions without breaking basic blocks in codes - especially in loops. Which in turn allowed the early Cray compilers to vectorize loops that were otherwise unvectorizable.

The Cray intrinsic pretty much directly translated into using the Cray ‘vector merge’ instruction. Todays architectures often support similar instructions for doing branchless decision making. (For example on my x86-64 machine, gfortran often uses variants of the cmov instructions.)

As for different character string lengths, I suspect we’ll have to wait until compilers start implementing F2023 conditional expressions.

1 Like

People can upvote the issue Allow character array constructors with variables of different LEN ¡ Issue #301 ¡ j3-fortran/fortran_proposals ¡ GitHub . Peter Klausler of NVIDIA wrote

As complete implementations of the language must already be capable of evaluating every effective item in an array constructor into an expanding pool of dynamic memory in worst case scenarios – think about an array constructor with implied DO loops with dynamic bounds, or items that are function results of dynamic shape – this request shouldn’t require too much extra work in a compiler or its runtime support library. If you could whip up a prototype in an open-source compiler as a demonstration, you might find support for this in other compilers easy to come by.

Unfortunately, the fortran-proposal GitHub doesn’t seem to have any observable influence on the language evolution…

4 Likes

I wish the committee embraced it more.

But at least we have all these ideas at one place, which is still helpful.

1 Like

This is another example:

program p
   character(*), parameter :: a = "abc"
   character(*), parameter :: b = "defgh"
   character(*), parameter :: de(*) = [a,b]
   print *, 'size=',size(de), 'len=',len(de)
end program

that is silently accepted by some compilers though with different behavior:

  • gfortran → warning, final len=3
  • ifort, ifx, flang → silently accepted, final len=5
  • nvfortran → silently accepted, final len=3
  • lFortran → semantic error
2 Likes

g95 by default says

In file xxxchar.f90:4

   character(*), parameter :: de(*) = [a,b]
                                         1
Error: Element in character array constructor at (1) has length 5 instead of 3

A large fraction of existing Fortran code is Fortran 95 or earlier, and it’s a good idea to test it with g95 before moving to later standards.

I think the general idea of the standard approach is that the array can be processed one element at a time with the declared final length already known.

All of the other suggestions in this post requires scanning the array to find the largest element, and if the array is an intermediate in an expression, that would require multiple steps, one to generate and scan, then allocate, and then generate again and store (or some “as if” equivalent of those steps). And this would need to be done for every array expression, even if the elements all were the same length and the programmer knew that ahead of time.

So I think that could be done by the language, but the inherent inefficiency would receive criticism from both application programmers and from compiler writers.

Instead, an alternative approach would be to leave the current efficient approach as is, which shifts the burden onto the programmer for array specification and assignment, and invent a new syntax for the complicated cases that require the additional effort.

5 Likes

The g95 compiler hasn’t been touched in over a decade, and it seems Andy has let the website disappear. If one is looking for Fortran 95 conformance, using gfortran with the -std=f95 option is probably a better choice.