Culture setting / inoculation against squiggles

Yes. Same predictable behavior on Windows and Linux with both gfortran and ifort. The answer isn’t nonsense - it is the length of the string in bytes - but this may not be what you want. It is easy to calculate the length of a utf-8 encoded string, and tidy it up if it has been truncated mid-character.

I am not defending this. I am saying that you can do useful work that works reliably with two leading compilers.

… and some code points take 24 or 32 bits, and some characters require 2 code points. That is how utf-8 and unicode encoding work.

For a processor to return the length in bytes will not be conforming: the standard is clear the LEN intrinsic shall return the number of characters in the string regardless of the KIND of the character entity.

That would be very problematic because you would have to use storage_size() in many locations where you normally want len() and because the issues discussed at Using Unicode Characters in Fortran - #14 by plevold

E.g., should “déšť” report the same len as “déšť”?

A portable code is supposed to behave the same whatever the compiler, present or future, and as I wrote before you cannot be sure here.

And it depends on what you mean by “useful work”. To go beyond very basic work with them you will have to write utility routines that take into account UTF-8 encoding (e.g. mylen(), 'mytrim()`…), so basically to do the work the compiler vendor would have to do to support UTF-8.

I never claimed my code is portable - just that it has worked with a couple of common compilers on a couple of common operating systems for many years.

There are many details that are “processor dependent” according to the standard. For example: signed zeros; precision and range of floating point types; denormalized floats; size of the address space (which can limit the size of arrays); character set; accuracy of special functions; the pseudorandom number generator; presence of a process exit status; existence of a companion C proecessor; the total number of unique statement labels in one program unit; maximum depth of nesting of nested INCLUDE lines; …

My life is much easier when I use arrays large than 2GB, signed zeros and C interoperability. You won’t convince me to make my code more portable by not using these features. Storing utf-8 strings in the default character array - as permitted by the standard (Section 6.1.6 of Fortran 2018) - is a similar issue.

I don’t try convincing you to change your code, just saying that what you are using is not a “feature”, but rather a trick with limited capabilities and that formally violates the standard.

This section does not unconditionnaly permit that.

Additional characters may be representable in the processor, but shall appear only in comments, character constants, input/output records, and character string edit descriptors

may is the important word here, it just opens up the possibily for a compiler to support additionnal characters in some contexts, without requiring them to do so.

Please don’t use made up names to address people here. You can use the @ character followed by their username to address any user.

The possibility sounds like a permission to me.

I think that text from the standard dates back to the f77 era, where the intention was to allow things like upper and lower case, or ascii characters that were not part of the limited fortran character set (like the chracters []{}&@#%). At that time, these were all cases where all the characters were stored in a fixed (6-, 7-, or 8-bits) size bit field. Thus the storage size, storage association, and len() were still in agreement, even with the additional characters. It is not clear how this should now be interpreted with character encodings that have variable numbers of bits. For example, an array declared as character(len=n) :: c(m) is supposed to have each element the same size so that there is a fixed spacing between, for example, c(i)(j:j) and c(i+1)(j:j). That fixed spacing does not apply when the storage sizes of the individual characters are all different. Or what should happen when c(i)(j:j) is changed from an 8-bit character to a 16-bit character?

Regarding the use of “may” (permission) and “might” (possibility), @sblionel explained this recently. The ISO convention used by the fortran standard uses “may” exclusively to cover both cases. Thus no inferences can be drawn regarding the distinction between the two meanings.

Right. May is permission, Can is ability. ISO doesn’t want standards to use might.

Whatever… Granting the permission does not imply that a given compiler uses this permission.

That’s the point.

Re culture settings and so forth. My conclusion so far is that you cannot do text handling on Fortran without being willing to face squiggles all the time. I had written the program in C and also in Fortran. No squiggles in C. A bunch of them in Fortran. In fact so many in Fortran that the application is almost useless.

I must respectfully disagree. I am not saying it is easy as you move up the learning curve.

1 Like

I think Fortran is great and the support from this forum is terrific, but I am disappointed by the text squiggles that I am getting, and is causing great problems.