Working with UTF-8 encoded data

urbanjost · September 23, 2025, 12:12pm

An alpha release of GitHub - urbanjost/M_unicode: Unicode support when ISO_10646 is not supported provides a user-defined type that can be used with ragged arrays of ASCII or UTF-8 encoded data. It does not require the compiler to support the Fortran Unicode extension but provides overloading for all the basic operators and character intrinsics with both a procedural and OOP interface. The UPPER() and LOWER() funtions support the concept of case for the Unicode Latin characters not just the ASCII subset, and a basic SORT() function provides for ordering the data by Unicode codepoint values. Documentation and examples are still a WIP but complete enough to guide usage for anyone interesting in trying it. Only tested with ifx and gfortran so far but until proven otherwise I think it should work with any environment where UTF-8 files are supported. So far that includes allowing what-you=see-is-what-you-get string constants on Linux and Cygwin at a minimum as well. It needs a DT and more CD/CI unit testing but builds easily with fpm. The intrinsic overloads include TOKENIZE and SPLIT. It is just an alpha release but should be useful for anyone working with UTF-8 data, particularly if the compiler does not support the UCS-4 extensions of Fortran.

Topic		Replies	Views
How to use utf-8 in gfortran? Help	36	1276	October 3, 2025
Using Unicode Characters in Fortran Tutorials	35	6770	January 20, 2025
Culture setting / inoculation against squiggles Help	35	1292	June 2, 2023
How do I file-read French special characters like é etc? Help	46	2438	January 22, 2024
Could someone please correct this code? Help	12	496	February 6, 2024

Working with UTF-8 encoded data

Related topics