Character array constructor

I would like to be able to write character array constructors without having to worry about the lengths, e.g. allowing ['a','bb','ccc'] as an array constructor instead of what I think are the current alternatives:
['a ','bb ','ccc'] or [character(3)::'a','bb','ccc']. There may of course be a good reason that I don’t know why f2023 does not permit the brief form.

1 Like

Standard requires that, unless started with type-decl ::, the array constructor values should have the same type and type parameters. For character arrays, this seems to be uncomfortable.
Interestingly, Intel’s ifort compiler accepts the brief form, issuing a warning only when run with “-stand-f18” option. Unfortunately gfortran reports an error.

1 Like

TL;DR:

  1. Fortran standard does not support what is a jagged array in CS parlance,
  2. The array is constructed independently (e.g., in a var = expr assignment, how RHS expr gets defined) and therefore, the array element length needs to be specified toward the construction.

[character(3)::'a','bb','ccc']

Is actually very good IMHO, as you know what it’s doing (put all your parameters into character(3) strings).

I created Allow character array constructors with variables of different LEN · Issue #301 · j3-fortran/fortran_proposals · GitHub for this. The committee has very likely rejected this syntax for a reason, and maybe the issue has already been submitted and closed (I did look for one).

Re: “The committee has very likely rejected this syntax for a reason,” yes, the indications are the committee has done so as part of the work on Fortran 2003 and the reasons are essentially as explained above.

It will be very, very interesting to see if anyone can influence the standard-bearers to do anything further here because they may agree with the other opinion on the current facility here i.e., it “is actually very good”!!

The fact jagged input length is an extension in at least the Intel compilers seems to imply the case is weak for not implementing it (note it is just on input, and is converted to an array with elements all having the length of the longest element) . There are several convenience functions that do that. I know “cc()” function was discussed recently on here; there are other examples like https://urbanjost.github.io/M_strings/bundle.3m_strings.html
and there are even ones where the arguments do not even have to be strings and are converted to strings. The methods usually use optional arguments and even class(*) arguments instead of a number of routines with different numbers of arguments collected into a generic but all work and are standard methods. That is not as nice as being supported by the standard but I can attest to that working well in non-performance-critical code.

I also hate both arguments having to be the same length in the MERGE function to the point where I overloaded it so they can be different lengths, so you can guess where I would probably cast my vote. Of course, even better would be support of intrinsic jagged arrays instead of having to make your own. See the stdlib string type for at least one example.

Is there a convenience routine like CC() or BUNDLE() in stdlib?

I don’t mind this notation for short cases like this example. But if the array is longer, spread over several lines, then sometimes it is difficult for the programmer to count characters and get the right value for the literal constant. And then later, if the initial values are changed, the programmer must manually recount and change the literal constant in the declaration. So I think there is a good reason to have a notation that avoids all that and is guaranteed to get the right character length.

Regarding the jagged array (jagged string length) comments, the problem there is that allocatable entities (of type character or otherwise) cannot be initialized with the current fortran standard. If that limitation were eliminated, then lots of things would be simpler, safer, and more robust in the language.

4 Likes

The committee did consider this case and chose instead to require specifying the length in the type prefix in order to request all the values be converted to the specified length. The same happens for other types/kinds in an array constructor. I haven’t been able to find notes on the discussion, however, but I do remember it.

So how about a “dynamic” constructor, like in:

program main
  implicit none
  character(len=12) :: arr(2)

  arr = [stralloc(7),'123456789']
  print *, "=",arr(1),"=",arr(2),"="
  arr = [stralloc(7),'12345']
  print *, "=",arr(1),"=",arr(2),"="
  arr = [stralloc(7),stralloc(10)]
  print *, "=",arr(1),"=",arr(2),"="
  arr = [stralloc(10),stralloc(7)]
  print *, "=",arr(1),"=",arr(2),"="
contains
  function stralloc(n)
    character(len=:), allocatable :: stralloc
    integer, intent(in) :: n
    stralloc = repeat('x',n)
  end function stralloc
end program main

Both gfortran and ifort compile this code w/o any warning. Results are, however, pretty unexpected:

$ gfortran-12 -Wall chrstar.f90 && ./a.out
 =xxxxxxx     =1234567     =
 =xxxxxxx     =12345       =
 =xxxxxxx     =xxxxxxx     =
 =xxxxxxxxxx  =xxxxxxx     =
$ ifort -stand=f18 chrstar.f90 && ./a.out
 =xxxxxxx     =123456789   =
 =xxxxx       =12345       =
 =xxxxxxx     =xxxxxxxxxx  =
 =xxxxxxxxxx  =xxxxxxx     =

Edit: added declaration of array which somehow dissappeared while copy-pasting

Where/how is arr(:) declared in this code?

Sorry, it somehow disappeared. Fixed now above

That was exactly my point, thank you. That initializer represents an array of fixed-length characters, hence its length needs to be specified somehow IMHO (besides, maybe, allowing a no-length version like many would like, sth like [character(*) :: 'aa','b','cccc'], because that RHS needs then to be assigned to a LHS that may have different or allocatable length.

A far more interesting story would be IMHO if the two major limitations to having a fully functional variable-length string type were removed, which in my opinion are:

  • cannot use derived types with (a) output format: need to specify (dt)
  • like said by many, cannot be parameterized as they contain allocatable entities.
1 Like

It needs to be known somehow. The Fortran standard committee decided that it needs to be specified explicitly, but that is not that only way how the length could be determined in other theoretical alternative possibilities. The compiler could go through all of the elements, at least in constant expressions, and determine the length of the longest element. Still we have to consider that it would have to be able to deal with named constants, function calls, operators and so it would not be simple parsing, but fixed uniform size does not necesarilly require explicit size specification.

So it adds runtime-checks! I did not know that option, thanks.

Here is a related question about allocatable arrays, including character arrays. Is there an easy way to change the bounds of an allocatable array without copying the underlying data? Can ASSOCIATE or MOVE_ALLOC() somehow be used to do this?

The short answer is no. You can do it with only one copy though.

call move_alloc(from=arr, to=tmp)
allocate(arr(new_size))
arr(1:old_size) = tmp

The reason is that there may not be enough available memory immediately following the currently allocated array, in which case the data would have to be moved anyway. This reality just gets exposed directly with the requirement of making a copy.

Agree on the single copy, however the code snippet doesn’t make sense, the temporary is left undestroyed.

The “canonical” form of the operation here is

   allocate( tmp(new_shape) )
   tmp(..) = arr
   call move_alloc( from=tmp, to=arr )

I wasn’t thinking of the resize problem, just the bounds change. Say you have an allocatable array with bounds 1:9, and you want to change the bounds to -4:4. Same size, same data, just new bounds. You can do this temporarily by passing the array to a subprogram with the right dummy argument declaration, and you can do the same thing with an ASSOCIATE block. That occurs without copying the data. But once you return, or exit the ASSOCIATE block, the actual array has the same bounds as before. Is there a way to actually change the bounds, permanently, of the original array without copying its contents?

Looking at potential syntax for variable length strings (as well as arrays of pointers), I’ve advocated the use of curly brackets {} (similar to what you do now with coarrays square brackets) to define an array of variable length strings and arrays of pointers.

Examples:

Character(LEN=:), allocatable, string_array{20}
Character(LEN=:), allocatable, string_array{:} ! deferred size
etc.
Something similar could be done for arrays of pointers

Real, Pointer, a{30}
Real, Pointer, a(:){:}

etc.
Note for me the primary use case for arrays of pointers is C-Interop where you
have multiple pointer indirections (**a) in a structure or as an argument. Unwinding these onthe Fortran side into something Fortran can use can be very complicated.

Just a suggestion but I would be happy with anything that removes the requirement to embed a deferred length string or pointer in a derived type