Size() vs. len() for characters in generic programing

The following code is not standard-conforming,

print *, size("fortran")

because,

Error: ‘array’ argument of ‘size’ intrinsic at (1) must be an array

One has to instead use,

print *, len("fortran")

This, however, creates complexities and unnecessary extra coding (and fencing via preprocessor conditions) to treat input character arguments differently from input arrays of numeric type in generic procedures.

Are there any backward compatibility constraints to disallow size() to accept input scalar character arguments?

The desired behavior would be to allow size() to accept scalar character arguments:

  1. If the input argument is an array of characters, then size() behaves as currently defined by the standard.
  2. If the input argument is a scalar character, then size() behaves just like len().
4 Likes

You could actually “cheat” a bit and add this yourself. Not sure if the standard has anything to say about this, but gfortran 10.3 is happy to compile it. Maybe not the best for readability, but I’ll let you decide what makes sense for your project… :wink:

module size_mod
    implicit none

    interface size
        module procedure size_chars
    end interface

contains

    integer pure function size_chars(chars) result(n)
        character(len=*), intent(in) :: chars

        n = len(chars)
    end function
end module

program main
    use size_mod

    integer :: arr(4)
    character(len=25) :: chars

    chars = 'abc'
    arr = [1, 2, 3, 4]

    write(*,*) size(arr)
    write(*,*) size(trim(chars))
    write(*,*) size("fortran")
end program

When compiled and run, this yields the following output:

$ gfortran.exe main.f90 && ./a.exe
           4
           3
           7
1 Like

I think, the proposed change would make the language more inconsistent. Currently, size can consistently only be applied to arrays, while your proposal would introduce an exception. Also, len() is querying a type-paramter, which is independent of the array size, as in

character(len=10) :: scalar
character(len=10) :: array(2)

So, to me, it seems to be more logical to use a different query function for it.

4 Likes

There are a number of intrinsic functions in Fortran, and standard functions and operators in C, all of which have the word “size” in them, but the meanings are quite different.

In Fortran, we have SIZE, which applies to arrays, not scalars, and which tells us how many elements the array has – that is just the count, regardless of whether each element occupies 1 byte (ASCII character), 13 bytes each (character(13) ), 4 bytes (integer) or some other number of bytes of memory (derived type variables).

If we have an array of character variables of length 15 each,

character(len=15) :: strs(17)

the size is 17, the length is 15, and any change to the language to allow mixing size and length may cause quite a bit of confusion. The benefits from the proposed change ought to be balanced against this confusion, just as we have to do when thinking of replacing intrinsic functions with our own functions with the same names.

2 Likes

It is conforming. User-defined functions can have the same names as intrinsic functions. When wondering what the standard says, one should compile with gfortran -std=f2018 or similar options for other compilers.

1 Like

I was going to say, the case of character(len=15) :: strs(17) vs character(len=15) :: strs should be handled, and it seems it would be confusing for size to return 15 in the second case, but 17 in the first case.

1 Like

Thank you all for sharing your thoughts on this. I understand the opposition viewpoint. The primary reason for asking for such an extension of functionality is to further simplify generic programming. The difference between an assumed-shape dummy character argument and arrays of numeric type makes it difficult to write code that works for both character and numeric-array input arguments.

function test(Array1, Array2)
    ...
    lenArray = len(Array1) ! works only for assumed-shape character arguments.
    lenArray = size(Array1) ! works only for numeric-array arguments or **array** of assumed-shape     character arguments.
    ...
    isTheSame = Array1 == Array2 ! works only for assumed-shape arguments
    isTheSame = all(Array1 == Array2) ! works only for numeric-array arguments
end function 

If extending the functionalities of size(), all(), any(), …, is not sensible, then, the only remedy that comes to my mind at the moment is to extend the functionality select type construct, if we are to write pure Fortran without preprocessing.

Maybe it is worth noting, for the general public, that the code commented by @Beliavsky is much more than just using the name of an intrinsic procedure for the user-defined function or subroutine. The use of interface size results in extending or overloading the intrinsic procedure size. This is clearly seen later in the snippet where it is called both with character(len=*) argument (using the user-defined size_chars function as well as with arrays (using the intrinsic size)
Compare the above code with the following (removed the interface and size_chars renamed to size

module size_mod
    implicit none
contains
    integer pure function size(chars) result(n)
        character(len=*), intent(in) :: chars
        n = len(chars)
    end function
end module

program main
    use size_mod
    integer :: arr(4)
    character(len=25) :: chars
    chars = 'abc'
    arr = [1, 2, 3, 4]
    write(*,*) size(arr)
    write(*,*) size(trim(chars))
    write(*,*) size("fortran")
  end program main

The compiler stops with error on first write, reporting the type mismatch between integer array arr and expected character(len=*) argument of size function which now fully occults the intrinsic.

2 Likes

That is the issue that I have with user-defined overloads to resolve this specific problem. The user-defined overload shadows the intrinsic procedure within the scoping unit. So, let’s take the overloading route. One will have to write an interface for every possible interface of the intrinsic size(), which seems like an overly-cumbersome solution imposed to users, compared to adding a fix for it to the language.

I am not sure what you mean by every possible interface. Are we not talking just about character(*) type? Then just using the single, simple module, as above, with the extending/overloading interface will do.

Gfortran yields the following error with your code,

size.f90:16:14:
16 |     write(*,*) size(arr)
     |              1
Error: Type mismatch in argument ‘chars’ at (1); passed INTEGER(4) to CHARACTER(*)

So, to make this work, one will have to also write another separate function for size() that takes integer arguments. Then why only integers? reals, complex, … With only intrinsic types and kinds, that is already >12 separate implementations. Isn’t that correct or am I missing something here?

The same issue also exists with intrinsics like all(), any(), … for writing generic code that works with both character and non-character types.

Now I am confused. Do you mean integer (etc.) arrays or scalars? If arrays - no need, use interface as above. If scalars - what would we want to get - number of bytes per integer scalar? That would go quite far from the intrinsic meaning of size as number of items in …. It might still hold for character entities which are composed of items (characters), but for scalar integers or complex?

Scalar numerics do not have elements. But since strings (characters) in Fortran are mutable, their behavior is rather similar to numeric arrays,

character(:), allocatable :: c = "fortran"
c(1:1)  = "F"

but not completely,

c(1) = "F" ! is invalid

This is not an issue in general, but becomes so when writing generic procedures that are supposed to work on both numeric arrays and scalar assumed-size character input arguments. Three remedies,

  1. Write (many) wrappers for the relevant intrinsic like size(), all(), any() such that they all treat numeric arrays and assumed-shape characters similarly.
  2. Language enhancement: Make changes to the language.
  3. Language extension: Use preprocessor fencing.

Option 1 is out: Too much work, little gain, potential performance loss.
Option 2 is posted here for discussion.
Option 3 is what I am currently using.

There are other oddities (or better to say, conventions) with character values in Fortran like,

"Fortran" == "Fortran " ! yields .true.

that is not always desirable. An extension of all() and any() to support scalar characters would resolve this issue by making an element-wise comparison.

1 Like

Yeah, the intrinsic string type has many quirks like this unfortunately.

Much of what has been written in this thread relates to ASCII characters. Support for characters of other types, such as UTF-8, UTF-16, etc., is still rudimentary in many Fortran libraries, and ShahMoradi’s proposal should be considered keeping in mind what LEN/SIZE should mean when these multi-byte characters are used – glyphs, phonemes, graphemes, etc.?

In Fortran we have the STORAGE_SIZE intrinsic function, which gives the size-in-bits of the first argument, if a scalar; the size of one element, if an array. When using only ASCII characters, one could divide the returned value from this intrinsic function by the bit-size of a character to obtain the result that ShahMoradi desires to have.

1 Like

@shahmoradi , it may be better to look at what Fortran 202Y will offer with enhanced Generics facilities in the language and then see what additional supporting features may be helpful for practitioners to author generic procedures.

At this stage, it may be premature to consider “utility features” such as size on types with length-type parameters that might become moot once Fortran includes a good Generics feature starting 202Y.

1 Like