Culture setting / inoculation against squiggles

Patrick · May 25, 2023, 9:08am

(I use the GFortran compiler) I get squiggles in my text when I use language specific characters. Is there a general way to stop that from happening? Patrick,

Arjen · May 25, 2023, 9:28am

What software do you use exactly? Is that in your IDE? The gfortran compiler is unlikely to be the culprit on its own.

Patrick · May 25, 2023, 9:49am

I am talking about my own software. In C there is “using System.Globalization”. I was wondering if something similar exists in Fortran.

In C you see this sort of thing:
using System;
using System.Collections;
using System.Globalization;

public class SamplesCultureInfo
{

public static void Main()
{

  // Creates and initializes the CultureInfo which uses the international sort.
  CultureInfo myCIintl = new CultureInfo("es-ES", false);

  // Creates and initializes the CultureInfo which uses the traditional sort.
  CultureInfo myCItrad = new CultureInfo(0x040A, false);

Arjen · May 25, 2023, 10:11am

Ah, that makes it clear. That would actually be C++, by the way. Of old, there has been the “locale”. I do not know if there is a Fortran library that does similar things. (The code you showed seems to me to be specific to the MicroSoft compilers, but it is not specifically my area of expertise.)

interkosmos · May 25, 2023, 10:24am

You may want to open standard output in UTF-8 mode to print unicode characters:

! unicode.f90
program main
    use, intrinsic :: iso_fortran_env, only: output_unit
    implicit none
    integer, parameter :: u = selected_char_kind('ISO_10646')

    open (output_unit, encoding='utf-8')
    print '(a)', '大海航行靠舵手'
end program main

Output:

$ gfortran -o unicode unicode.f90
$ ./unicode
大海航行靠舵手

Patrick · May 25, 2023, 11:12pm

Hi Mr Universe, this code produced text without squiggles. Is this about okay? Should anything be different apart frim the blsjt? Patrick,

program main
use, intrinsic :: iso_fortran_env, only: output_unit
implicit none
integer, parameter :: u = selected_char_kind(‘ISO_10646’)
integer::banana
character(len=20)::line
open (banana,file=“funny.txt”,action=“write”,encoding=“utf-8”)
write(banana,“(a)”)“Hôtel Chez Frédérque”
read*
end program main

DavidB · May 26, 2023, 2:56am

Hi Patrick,

I am not sure your test is doing what you think. The following works for me with gfortran 11.3.0 under cygwin and ifort 2021.3.0 under Windows 10Pro. It shows that sometimes things “just work”.

program main
implicit none
integer unit
character(len=*), parameter :: string = 'Hôtel Chez Frédérque'
open (newunit=unit,file='funny.txt',action='write')
write(unit,"(a)") string
end program main

gfortran generates the following. ifort is identical, except generates \r\n carriage control by default.

$ cat funny.txt
Hôtel Chez Frédérque

$ od -c funny.txt
0000000   H 303 264   t   e   l       C   h   e   z       F   r 303 251
0000020   d 303 251   r   q   u   e  \n
0000031

However, it depends on the encoding of the string in the Fortran source file. When I pasted your code above into Visual Studio the encoding was munged. When i used a different editor - either emacs or Notepad++ - the utf-8 encoding was preserved. I could then open the source file in VS and it all worked.

Using “od -c” or a utility that shows the encoding of the Fortran source code and the output file may be enlightening.

Note that Intel Fortran only supports a single character KIND. You can use it for UTF-8 data, but multi-byte characters are stored in (not surprisingly) multiple bytes. I work on software used in around 100 countries and users have (almost) no problems with labels in their native languages. We programmers just have to accept that the displayed length is not always the length of the string. Our GUI is written in C#, but some text is written from Fortran.

Edit: I reproduced the issue of pasting the code into VS. The Fortran source file was encoded as “ISO-8859 text”. The working version is “Unicode text, UTF-8 text”.

VladimirF · May 26, 2023, 6:16am

Yes, in MS Windows it is quite possible that text editor will save the source file in various non-Unicode encodings. In my country that could be CP1250, in Spain it will be a different one.

DavidB · May 26, 2023, 8:01am

You can see the problem by changing the encoding on the source code. The only changes are to the accented characters in the string - the rest of the program is ASCII.

$ iconv -f utf-8 -t iso-8859-1 charkind01.f90 > charkind02.f90

$ file charkind*.f90
charkind01.f90: Unicode text, UTF-8 text
charkind02.f90: ISO-8859 text

$ gfortran -Wall -g -std=f2008 -Wextra -Wall -fcheck=all charkind02.f90 -o charkind02.exe

$ ./charkind02.exe

$ cat funny.txt
H▒tel Chez Fr▒d▒rque

$ od -c funny.txt
0000000   H 364   t   e   l       C   h   e   z       F   r 351   d 351
0000020   r   q   u   e  \n
0000025

FortranFan · May 26, 2023, 9:13pm

Per the standard, the code shall be along the following lines:

   integer, parameter :: CK = selected_char_kind('ISO_10646')
   integer :: lun
   character(kind=CK, len=*), parameter :: string = CK_'𨉟呐㗂越'
   open(newunit=lun, encoding="utf-8", file='funny.txt', action='write')
   write(lun, fmt=*) string
end

Processor implementors can explain whether and how their implementations can process the conforming code and the expected program behavior. Intel, for example, has decided not to support this into the foreseeable future which is a real shame given how widely UTF-8 is used globally. What does Intel Inc., say, have against Vietnamese here, Intel’s software team does not want to help Intel do business in Vietnam or what?

Patrick · May 26, 2023, 9:24pm

I do not understand this string - what is CK doing in it?

FortranFan · May 26, 2023, 9:51pm

@Patrick,

If and when you stick around with Fortran long enough and care to follow the standard - it takes a bit of effort and attention - the standard and its details are as not readily obvious as you appear to demand in replies to your posts from other readers who donating their time - you can read through this, especially NOTE 2:

DavidB · May 26, 2023, 10:32pm

Intel Fortran handles non-European strings - at least those that are written left-to-right - adequately for many application. We find the biggest weakness is console I/O and editors. Everyone needs to be aware of locale and file encoding issues.

integer unit
character(len=*), parameter :: &
  string = 'Mình nói tiếng Việt (𨉟呐㗂越, "I speak Vietnamese")'
open (newunit=unit,file='funny3.txt',encoding="utf-8",action='write')
write(unit,"(a)") string
end

generates

$ cat funny3.txt
Mình nói tiếng Việt (𨉟呐㗂越, "I speak Vietnamese")

Patrick · May 26, 2023, 10:33pm

My question has been answered by David.

RonShepard · May 27, 2023, 12:19am

If you look two lines before the parameter definition, you will see that CK is a character kind value. Then the parameter definition uses a character string literal of that kind. Other data types in fortran, like integer and real, have the kind value after the literal value (e.g. 1_int32, or 1.0_real32), but characters are the other way, the kind value is before the literal.

FortranFan · May 27, 2023, 12:21am

This line does not conform, it’s a processor extension from Intel which is strange since they only intend to support a single kind for the character type but which they consider as an ASCII kind when it is not.

DavidB · May 27, 2023, 1:29pm

I am not a expert in the Fortran standard, but I have a good working knowledge. I don’t see why the code is non-conformant. Happy to learn more but I have no alternative but to continue working with code like the above.

Neither gfortran (with -std=f2018) nor ifort (with /stand:f18) warn about non-standard code.

The compiler treats the character variable as a sequence of bytes. The editor and console display the code as utf-8 but the compiler doesn’t know (or care) about that. The programmer needs to be careful not to destroy the utf-8 encoding when manipulating substrings and shouldn’t expect sensible results from collating sequences and the like.

It is a different matter if your editor inserts a byte order mark.

PierU · May 27, 2023, 3:04pm

True, but the individual bytes that form an UTF-8 sequence do not necessarily map to existing characters in the default character kind (which is only required to support the Fortran character set, which is not even the full ASCII set).

So yes, it works in practice (as long as you are doing only basic stuff like this), and it has even chances to work with all existing compilers. Still, it’s formally not standard compliant, and as such there is no 100% garantee that it always works.

DavidB · May 27, 2023, 10:43pm

Yes. I agree. I am not sure if it is standard compliance or quality of implementation, but it doesn’t matter.

RonShepard · May 27, 2023, 11:33pm

It does matter when it comes to writing code that behaves the way the language says that it should behave. You previously also made this statement:

which also shows that the details do matter. On my computer, which is MacOS, not Windows, your string prints “correctly” (I think), but when I do something as simple as len(string) to ask how long the string is, then I get nonsense. Some of the displayed characters appear to be encoded in 8 bits and other are encoded in 16 bits. I guess that is the way that character encoding is supposed to work, but how could you write any kind of portable code by just ignoring those details?

I should add that I cannot tell if my compiler is really doing the right thing. I’m guessing in part by how the string appears in my browser when reading this thread and comparing that to what is printed by the compiler.

Topic		Replies	Views
Using Unicode Characters in Fortran Tutorials	35	6584	January 20, 2025
How do I file-read French special characters like é etc? Help	46	2393	January 22, 2024
How to use utf-8 in gfortran? Help	31	862	August 19, 2025
Could someone please correct this code? Help	12	478	February 6, 2024
Fortran 2023 standard Help	153	9261	January 9, 2024

Culture setting / inoculation against squiggles

Related topics