An alpha release of GitHub - urbanjost/M_unicode: Unicode support when ISO_10646 is not supported provides a user-defined type that can be used with ragged arrays of ASCII or UTF-8 encoded data. It does not require the compiler to support the Fortran Unicode extension but provides overloading for all the basic operators and character intrinsics with both a procedural and OOP interface. The UPPER() and LOWER() funtions support the concept of case for the Unicode Latin characters not just the ASCII subset, and a basic SORT() function provides for ordering the data by Unicode codepoint values. Documentation and examples are still a WIP but complete enough to guide usage for anyone interesting in trying it. Only tested with ifx and gfortran so far but until proven otherwise I think it should work with any environment where UTF-8 files are supported. So far that includes allowing what-you=see-is-what-you-get string constants on Linux and Cygwin at a minimum as well. It needs a DT and more CD/CI unit testing but builds easily with fpm. The intrinsic overloads include TOKENIZE and SPLIT. It is just an alpha release but should be useful for anyone working with UTF-8 data, particularly if the compiler does not support the UCS-4 extensions of Fortran.
8 Likes