Culture setting / inoculation against squiggles

DavidB · May 28, 2023, 12:20am

Yes. Same predictable behavior on Windows and Linux with both gfortran and ifort. The answer isn’t nonsense - it is the length of the string in bytes - but this may not be what you want. It is easy to calculate the length of a utf-8 encoded string, and tidy it up if it has been truncated mid-character.

I am not defending this. I am saying that you can do useful work that works reliably with two leading compilers.

… and some code points take 24 or 32 bits, and some characters require 2 code points. That is how utf-8 and unicode encoding work.

FortranFan · May 28, 2023, 1:42am

For a processor to return the length in bytes will not be conforming: the standard is clear the LEN intrinsic shall return the number of characters in the string regardless of the KIND of the character entity.

VladimirF · May 28, 2023, 10:06am

That would be very problematic because you would have to use storage_size() in many locations where you normally want len() and because the issues discussed at Using Unicode Characters in Fortran - #14 by plevold

VladimirF · May 28, 2023, 10:18am

E.g., should “déšť” report the same len as “déšť”?

PierU · May 28, 2023, 11:10am

A portable code is supposed to behave the same whatever the compiler, present or future, and as I wrote before you cannot be sure here.

And it depends on what you mean by “useful work”. To go beyond very basic work with them you will have to write utility routines that take into account UTF-8 encoding (e.g. mylen(), 'mytrim()`…), so basically to do the work the compiler vendor would have to do to support UTF-8.

DavidB · May 28, 2023, 12:35pm

I never claimed my code is portable - just that it has worked with a couple of common compilers on a couple of common operating systems for many years.

There are many details that are “processor dependent” according to the standard. For example: signed zeros; precision and range of floating point types; denormalized floats; size of the address space (which can limit the size of arrays); character set; accuracy of special functions; the pseudorandom number generator; presence of a process exit status; existence of a companion C proecessor; the total number of unique statement labels in one program unit; maximum depth of nesting of nested INCLUDE lines; …

My life is much easier when I use arrays large than 2GB, signed zeros and C interoperability. You won’t convince me to make my code more portable by not using these features. Storing utf-8 strings in the default character array - as permitted by the standard (Section 6.1.6 of Fortran 2018) - is a similar issue.

PierU · May 28, 2023, 2:23pm

I don’t try convincing you to change your code, just saying that what you are using is not a “feature”, but rather a trick with limited capabilities and that formally violates the standard.

This section does not unconditionnaly permit that.

Additional characters may be representable in the processor, but shall appear only in comments, character constants, input/output records, and character string edit descriptors

may is the important word here, it just opens up the possibily for a compiler to support additionnal characters in some contexts, without requiring them to do so.

milancurcic · May 28, 2023, 5:33pm

Please don’t use made up names to address people here. You can use the @ character followed by their username to address any user.

VladimirF · May 29, 2023, 6:20am

The possibility sounds like a permission to me.

RonShepard · May 29, 2023, 3:47pm

I think that text from the standard dates back to the f77 era, where the intention was to allow things like upper and lower case, or ascii characters that were not part of the limited fortran character set (like the chracters []{}&@#%). At that time, these were all cases where all the characters were stored in a fixed (6-, 7-, or 8-bits) size bit field. Thus the storage size, storage association, and len() were still in agreement, even with the additional characters. It is not clear how this should now be interpreted with character encodings that have variable numbers of bits. For example, an array declared as character(len=n) :: c(m) is supposed to have each element the same size so that there is a fixed spacing between, for example, c(i)(j:j) and c(i+1)(j:j). That fixed spacing does not apply when the storage sizes of the individual characters are all different. Or what should happen when c(i)(j:j) is changed from an 8-bit character to a 16-bit character?

Regarding the use of “may” (permission) and “might” (possibility), @sblionel explained this recently. The ISO convention used by the fortran standard uses “may” exclusively to cover both cases. Thus no inferences can be drawn regarding the distinction between the two meanings.

sblionel · May 29, 2023, 9:10pm

Right. May is permission, Can is ability. ISO doesn’t want standards to use might.

PierU · May 30, 2023, 6:39am

Whatever… Granting the permission does not imply that a given compiler uses this permission.

sblionel · May 30, 2023, 2:14pm

That’s the point.

Patrick · June 1, 2023, 7:53pm

Re culture settings and so forth. My conclusion so far is that you cannot do text handling on Fortran without being willing to face squiggles all the time. I had written the program in C and also in Fortran. No squiggles in C. A bunch of them in Fortran. In fact so many in Fortran that the application is almost useless.

DavidB · June 2, 2023, 1:32am

I must respectfully disagree. I am not saying it is easy as you move up the learning curve.

Patrick · June 2, 2023, 11:56am

I think Fortran is great and the support from this forum is terrific, but I am disappointed by the text squiggles that I am getting, and is causing great problems.

Topic		Replies	Views
Could someone please correct this code? Help	12	469	February 6, 2024
Using Unicode Characters in Fortran Tutorials	35	6374	January 20, 2025
Default language for code snippets changed? Meta	4	419	May 29, 2022
How do I file-read French special characters like é etc? Help	46	2354	January 22, 2024
Getting red squiggly lines in VSCode, while saving file with .f extension Visual Studio Code	12	1365	May 30, 2023

Culture setting / inoculation against squiggles

Related topics