Using Unicode Characters in Fortran

plevold · February 11, 2022, 7:48am

I’m not quite sure I understood what your ulen function is trying to achieve. Do you want to count the number of bytes in the character sequence, the number of grapheme clusters or the width of the text displayed on screen?

The number of bytes can be computed easily with len(chars) (multiplied by a constant if using non-default character kinds).

Counting the number of grapheme clusters is not that straight forward, but can be done if you find the right algorithm and port it to Fortran or make a C interface . I think for example the Rust crate unicode-segmentation will do that for you.

Determining the width of a string boils down to determining the width of each grapheme cluster. Even for monospaced fonts this is a non-trivial task. Take for example the following string:

|😎|⋮|

The characters are (here is a nice tool to determine that):

U+007C : VERTICAL LINE {vertical bar, pipe}
U+1F60E : SMILING FACE WITH SUNGLASSES
U+007C : VERTICAL LINE {vertical bar, pipe}
U+22EE : VERTICAL ELLIPSIS
U+007C : VERTICAL LINE {vertical bar, pipe}

If we try to align this with punctuation marks

|😎|
|...|
|⋮|
|.|

we see that even for a monospaced font

The SMILING FACE WITH SUNGLASSES emoji is slightly shorter that three punctuation marks
The VERTICAL ELLIPSIS is slightly shorter than one punctuation mark

This is even further complicated by the fact that if the monospace font in use does not have a character the application or (most likely) the OS will fallback to another font. Because of this it might even be that you’re seeing a different width of the characters above than what I’m doing!

Topic		Replies	Views
Culture setting / inoculation against squiggles Help	35	1262	June 2, 2023
How do I file-read French special characters like é etc? Help	46	2375	January 22, 2024
Formatted hexadecimal output in lower case Help	25	424	September 24, 2024
Handling files in directories whose names are not entirely ASCII Help	8	1084	November 23, 2022
How do i allocate an array of strings?	71	5453	September 30, 2023

Using Unicode Characters in Fortran

Related topics