I’m not quite sure I understood what your ulen
function is trying to achieve. Do you want to count the number of bytes in the character sequence, the number of grapheme clusters or the width of the text displayed on screen?
The number of bytes can be computed easily with len(chars)
(multiplied by a constant if using non-default character kinds).
Counting the number of grapheme clusters is not that straight forward, but can be done if you find the right algorithm and port it to Fortran or make a C interface . I think for example the Rust crate unicode-segmentation will do that for you.
Determining the width of a string boils down to determining the width of each grapheme cluster. Even for monospaced fonts this is a non-trivial task. Take for example the following string:
|😎|⋮|
The characters are (here is a nice tool to determine that):
U+007C : VERTICAL LINE {vertical bar, pipe}
U+1F60E : SMILING FACE WITH SUNGLASSES
U+007C : VERTICAL LINE {vertical bar, pipe}
U+22EE : VERTICAL ELLIPSIS
U+007C : VERTICAL LINE {vertical bar, pipe}
If we try to align this with punctuation marks
|😎|
|...|
|⋮|
|.|
we see that even for a monospaced font
- The
SMILING FACE WITH SUNGLASSES
emoji is slightly shorter that three punctuation marks - The
VERTICAL ELLIPSIS
is slightly shorter than one punctuation mark
This is even further complicated by the fact that if the monospace font in use does not have a character the application or (most likely) the OS will fallback to another font. Because of this it might even be that you’re seeing a different width of the characters above than what I’m doing!