program utf
use iso_fortran_env, only: output_unit
implicit none
integer, parameter :: ucs4 = selected_char_kind('iso_10646')
character(len=100, kind=ucs4) :: line
open(10, file='test.txt', form='formatted', action='read', encoding='utf-8')
! the compiler did not accept encoding=‘ucs4’
read(10, '(A)') line
write(*, '(A)') line
close(10)
end program utf
What I am trying to do, is find a way to deal with squiggles in the code for manipulating text like “à l’école, j’ai été …”
implicit none
integer, parameter :: ucs4 = selected_char_kind("iso_10646")
print*,ucs4
end
gfortran version 14.0.1 20240121 gives output 4, but ifort version 2021.6.0 gives -1, meaning that ifort does not support this kind. What compiler and version are you using?
What is the actual encoding of the file you are trying to read? If you try to use ucs4, I’d guess it is a full-unicode file, with every character stored in 4 bytes. I guess that it is quite possible, however, that it is a utf-8 file which can be read/written using default characters. Even intrinsic functions work on such strings, like index or len, one has to remember, however, that the length as stored is greater than the number of characters visible.
program char8
implicit none
character(len=80) :: line
character(len=:), allocatable :: s1, s2, s3
integer :: i, l1, l2, l3
open(11,file='test.txt', action='read')
read(11,'(a)') line
s1 = trim(line)
read(11,'(a)') line
s2 = trim(line)
read(11,'(a)') line
s3 = trim(line)
l1 = len(s1)
l2 = len(s2)
l3 = len(s3)
print '(a/ i3,2x,*(a2,z3))', s1, l1, (s1(i:i),ichar(s1(i:i)),i=1,l1)
print '(a/ i3,2x,*(a2,z3))', s2, l2, (s2(i:i),ichar(s2(i:i)),i=1,l2)
print '(a/ i3,2x,*(a2,z3))', s3, l3, (s3(i:i),ichar(s3(i:i)),i=1,l3)
print *, index(s3,s1), index(s3,s1,back=.true.)
end program char8
with the test.txt file containing UTF-8 encoded text:
é
à l’école
j’ai été
gives the following output (both with gfortran and ifx):
$ gfortran char8.f90 && ./a.out
é
2 � C3 � A9
à l’école
13 � C3 � A0 20 l 6C � E2 � 80 � 99 � C3 � A9 c 63 o 6F l 6C e 65
j’ai été
12 j 6A � E2 � 80 � 99 a 61 i 69 20 � C3 � A9 t 74 � C3 � A9
8 11
To handle special characters like “à l’école, j’ai été …” in Fortran, which may not be represented correctly using the default character kind, you’re on the right track by using UTF-8 encoding and attempting to specify a UCS-4 (4-byte Unicode) character kind. However, there are a few things to consider and potentially adjust in your code:
Character Kind: UCS-4 (selected_char_kind('iso_10646')) is a good choice for handling a wide range of Unicode characters. It represents each character with 4 bytes, which is suitable for most Unicode characters.
File Encoding: In Fortran, the encoding parameter in the open statement is typically used to specify the character encoding of the file. Unfortunately, not all compilers may support encoding='utf-8'. If your compiler doesn’t support this, it might not correctly interpret UTF-8 encoded files.
Read and Write Statements: When you read from and write to the console or a file, ensure that the system you’re running on can handle UTF-8 or UCS-4 encoded text. This is more about the environment than the Fortran language itself.
Compiler Support: Not all Fortran compilers support UCS-4 or UTF-8 encoding natively. You might need to check your compiler’s documentation for specific support and syntax.
Given these points, here are some suggestions:
Check Compiler Documentation: Ensure that your compiler supports UCS-4 and UTF-8. If it does, check the exact syntax and capabilities.
Alternative Approach: If UCS-4 is not supported, you might need to use a workaround. One common method is to read and process the text as a byte stream (using character(len=1), for example) and then convert it to the correct characters in your program. This approach is more complex but can be more portable across different compilers.
Test with Simple Text First: Start with a simple text file containing special characters to ensure that your read and write procedures work as expected.
Environment Considerations: Ensure that the environment where your program runs (like the console or the text editor) supports UTF-8 or UCS-4.
gnome terminal for french language
To configure the GNOME Terminal for French language, you need to change the language settings of your system to French. GNOME Terminal uses the system’s language settings to determine which language to display its interface in. Here’s a general guide on how to do it:
Open System Settings: Go to your system settings. This is usually accessible through the system menu or the application launcher.
Find Language or Region Settings: Look for a section in the system settings that deals with language or regional settings. This might be named “Language & Region”, “Language”, “Regional Settings”, or something similar.
Add or Select French Language: In the language settings, you should have the option to add a new language or select from a list of installed languages. Add or select French (Français). If French is not available, you might need to download it.
Apply the Changes: After selecting French, you will likely need to apply the changes. You may be prompted to log out and log back in, or restart your computer for the changes to take effect.
Check GNOME Terminal: Once your system is set to French, open GNOME Terminal. It should now display its menus and messages in French.
Keep in mind that these steps can vary slightly depending on the version of your operating system and its desktop environment. If you’re using a specific distribution of Linux like Ubuntu, Fedora, or Debian, the exact steps might be slightly different.
If you’re still having issues, you might want to share the specific error messages or behavior you’re encountering, as that can provide more insights into the problem.
Thank you for your very comprehensive response. I shall stop attempting to write French text handling applications in Fortran or C. The rigmarole is too overwhelming. I used to write in C#, with no problems setting culture - but I am not returning to that language, and, although I do speak French, I do not want a French display, although I did contemplate that option.
ian@ian-Latitude-E7440:~$ gfortran frtext.f08 -o frtext
ian@ian-Latitude-E7440:~$ ./frtext
Je m’appelle Jessica. Je suis une fille, je suis française et j’ai treize ans. Je vais à l’école à Nice, mais j’habite à Cagnes-Sur-Mer. J’ai deux frères. Le premier s’appelle Thomas
ian@ian-Latitude-E7440:~$ cat frtext.f08
PROGRAM FrenchTextEditor
IMPLICIT NONE
CHARACTER(LEN=200) :: line
INTEGER :: iost
OPEN(UNIT=10, FILE='french1.txt', STATUS='OLD', ACTION='READ')
DO
READ(10, '(A)', IOSTAT=iost) line
IF (iost /= 0) EXIT
PRINT *, line
END DO
CLOSE(UNIT=10)
END PROGRAM FrenchTextEditor
ian@ian-Latitude-E7440:~$ cat french1.txt
Je m’appelle Jessica. Je suis une fille, je suis française et j’ai treize ans. Je vais à l’école à Nice, mais j’habite à Cagnes-Sur-Mer. J’ai deux frères. Le premier s’appelle Thomas, il a quatorze ans. Le second s’appelle Yann et il a neuf ans. Mon papa est italien et il est fleuriste. Ma mère est allemande et est avocate. Mes frères et moi parlons français, italien et allemand à la maison. Nous avons une grande maison avec un chien, un poisson et deux chats.
ian@ian-Latitude-E7440:~$