Does the standard require all character elements to be of the same length in an array constructor?
For example is this allowed: ["a", "abc"]. It seems that some compilers will take the longest string and pad the other strings to be of the same length, while some other compilers give an error.
Test program:
program t
call p(["a", "abc"])
contains
subroutine p(x)
character(len=*), intent(in) :: x(:)
print *, x
print *, size(x), len(x(0))
end subroutine
end program
This gives with ifx 2021.11.0:
a abc
2 3
With Flang:
a abc
2 3
With GFortran 15.1:
/app/example.f90:9:24:
9 | print *, size(x), len(x(0))
| 1
Warning: Array reference at (1) is out of bounds (0 < 1) in dimension 1
/app/example.f90:2:12:
2 | call p(["a", "abc"])
| 1
Error: Different CHARACTER lengths (1/3) in array constructor at (1)
Compiler returned: 1
With LFortran
semantic error: Different `character` lengths 1 and 3 in array constructor
--> a.f90:2:14
|
2 | call p(["a", "abc"])
| ^^^^^
The standard requires it. Some functions as well. I find that particularly irritating in the MERGE() procedure. A rather verbose but standard way to allow the strings to be different lengths is
[ character(len=10) :: 'a', 'bb','jjjjjjjjjj']
I have seen others create functions to handle that. I have one myself. Initially the standard method was so irritating (and prone to truncation) that I made a procedure that could take up to 20 arguments and return an array; but I have gradually gotten so I can use the standard syntax.
It is a common extension to do as described (extend to longest length.) Not sure if trailing whitespace always counts or not; particularly if variables are used instead of fixed strings.
From the standard
NOTE6
An example of an array constructor that speciďŹes a length type parameter:
[ CHARACTER(LEN=7) :: âTakataâ, âTanakaâ, âHayashiâ ]
In this constructor, without the type speciďŹcation, it would have been necessary to specify all of the constants
with the same character length.
Yes unless there is a type-spec. See F2023 7.8 para 2 which says
If type-spec is omitted, corresponding length type parameters of the
declared type of each ac-value expression shall have the same value; in
this case, the declared type and type parameters of the array constructor
are those of the ac-value expressions.
To see if code is standard-conforming, one should compile with the strict standard option. For your code with ifx one gets
c:\fortran\test>ifx -nologo -stand:f18 xtest_char.f90
xtest_char.f90(2): warning #8208: If type specification is omitted, each ac-value expression in the array constructor of type CHARACTER must have the same length type parameters. ['a']
call p(["a", "abc"])
--------^
LLVM flang also gives a warning with --pedantic option (Compiler Explorer)
/app/example.f90:2:14: portability: Character literal in array
constructor without explicit type has different length than
earlier elements
call p(["a", "abc"])
^^^^^
Result:
a abc
2 3
while nvfortran gives this result (by probably determining the length from the first element) (Compiler Explorer)
aa
2 1
I feel the extension by ifx and LLVM Flang is the most convenient, but unfortunately gfortran does not support it (yet?).
Curious if any compilers allow the return value candidates be different lengths in MERGE(), and if so do they expand everyone to the same length or return the original lengths?
Does the standard restriction allow for significantly more efficient code? Wondering if there is a good reason for the default? That being said, I certainly prefer the extension described above.
program testit
logical answer
integer i
do i=1,10
call random_number(r)
answer=merge(.true.,.false.,r>0.5)
#ifdef STANDARD
write(*,*)trim(merge("yes","no ",answer)) ! standard
#else
write(*,*)merge("yes","no",answer) ! non-standard
#endif
enddo
end program testit
Gfortran and ifx do not, but you can write a function that allows the first two arguments to have different LENs and returns a character variable with the larger LEN:
module m_mod
implicit none
private
public :: mymerge
contains
elemental function mymerge(x, y, tf) result(z)
character (len=*), intent(in) :: x, y
logical , intent(in) :: tf
character (len=max(len(x), len(y))) :: z
if (tf) then
z = x
else
z = y
end if
end function mymerge
end module m_mod
program main
use m_mod, only: mymerge
implicit none
character (len=*), parameter :: fmt_c = "(*(1x,a))"
print fmt_c,mymerge("yes", "no", .true.)
print fmt_c,mymerge("yes", "no", [.true., .false.])
print fmt_c,mymerge(["yes ", "sure", "ok "], "no", .true.)
print fmt_c,mymerge(["yes ", "sure", "ok "], "no", [.true.,.false.,.true.])
end program main
The merge intrinsic was originally supposed to be a portable replacement for the Cray CVMGx instrinsics. These allowed conditional expressions without breaking basic blocks in codes - especially in loops. Which in turn allowed the early Cray compilers to vectorize loops that were otherwise unvectorizable.
The Cray intrinsic pretty much directly translated into using the Cray âvector mergeâ instruction. Todays architectures often support similar instructions for doing branchless decision making. (For example on my x86-64 machine, gfortran often uses variants of the cmov instructions.)
As for different character string lengths, I suspect weâll have to wait until compilers start implementing F2023 conditional expressions.
As complete implementations of the language must already be capable of evaluating every effective item in an array constructor into an expanding pool of dynamic memory in worst case scenarios â think about an array constructor with implied DO loops with dynamic bounds, or items that are function results of dynamic shape â this request shouldnât require too much extra work in a compiler or its runtime support library. If you could whip up a prototype in an open-source compiler as a demonstration, you might find support for this in other compilers easy to come by.
program p
character(*), parameter :: a = "abc"
character(*), parameter :: b = "defgh"
character(*), parameter :: de(*) = [a,b]
print *, 'size=',size(de), 'len=',len(de)
end program
that is silently accepted by some compilers though with different behavior:
gfortran â warning, final len=3
ifort, ifx, flang â silently accepted, final len=5
I think the general idea of the standard approach is that the array can be processed one element at a time with the declared final length already known.
All of the other suggestions in this post requires scanning the array to find the largest element, and if the array is an intermediate in an expression, that would require multiple steps, one to generate and scan, then allocate, and then generate again and store (or some âas ifâ equivalent of those steps). And this would need to be done for every array expression, even if the elements all were the same length and the programmer knew that ahead of time.
So I think that could be done by the language, but the inherent inefficiency would receive criticism from both application programmers and from compiler writers.
Instead, an alternative approach would be to leave the current efficient approach as is, which shifts the burden onto the programmer for array specification and assignment, and invent a new syntax for the complicated cases that require the additional effort.
The g95 compiler hasnât been touched in over a decade, and it seems Andy has let the website disappear. If one is looking for Fortran 95 conformance, using gfortran with the -std=f95 option is probably a better choice.