Unicode documentation available on the Fortran Wiki

A new document regarding Unicode has been added to the Fortran Wiki that includes links to Fortran modules implementing solutions to some of the remaining issues with Unicode usage in Fortran plus dozens of example programs. A final pass is being made to reduce errata but the section on using the ISO_10646 extension is extensive and replete with detailed examples and references to pertinent sections of the Fortran Standard to explain why some of the information found there is different than several other references.

It is a wiki; so feel free to make corrections or clarifications. Note that the M_unicode module referenced is complete although still being expanded, but the M_ucs4 module is functional but needs further documentation and examples added.


References

See Also

  • uni.f90 is a stand-alone single source file
    that builds a utility program called uni for manipulating Unicode data in UTF-8 files; demonstrating various aspects of M_unicode.

Related …

  • M_strings for ASCII string procedures
  • M_io for filesystem and I/O related functions
  • M_attr for ANSI terminal color and attributes
9 Likes

M_unicode v2.0.0 is released, including bi-directional support of C-style escape sequences and globbing.

globbing and C-style escape codes

     program demo_glob
     use M_unicode, only : glob, trim, unicode_type, len
     use M_unicode, only : remove_backslash
     use M_unicode, only : assignment(=)
     implicit none
     integer :: i
     type(unicode_type),allocatable :: ufiles(:)
     type(unicode_type),allocatable :: matched(:)
     character(len=*),parameter :: &
      filenames(*)= [character(len=256) :: &
     & 'My_favorite_file.F90',    & ! English
     & '我最喜欢的文档.c',         & ! Mandarin_Chinese
     & 'मरी_पसदीदा_फाइल.f90',       & ! Hindu
     & 'Mi_archivo_favorito.c',   & ! Spanish
     & 'ملفي_المفضل.h',         & ! Modern_Standard_Arabic
     & 'Mon_fichier_préféré.f90', & ! French
     & 'আমার_পরিয_ফাইল',          & ! Bengali
     & 'Meu_arquivo_favorito',    & ! Portuguese
     & 'Мой_любимый_файл',          & ! Russian
     & 'میری_پسندیدہ_فائل.pdf',   & ! Urdu
     & 'app/main.f90 ']
     character(len=*),parameter :: &
      encoded(*)= [character(len=256) :: &
     & 'My_favorite_file.F90',                    & ! English
     & '\u6211\u6700\u559C\u6B22\u7684\u6587\u6863.c', & ! Mandarin_Chinese
     & '\u092E\u0947\u0930\u0940_&
     &\u092A\u0938\u0902\u0926\u0940\u0926\u093E_&
     &\u092B\u093C\u093E\u0907\u0932.f90',        & ! Hindu
     & 'Mi_archivo_favorito.c',                   & ! Spanish
     & '\u0645\u0644\u0641\u064A_&
     &\u0627\u0644\u0645\u0641\u0636\u0644.h ',   & ! Modern_Standard_Arabic
     & 'Mon_fichier_pr\xE9f\xE9r\xE9.f90',        & ! French
     & '\u0986\u09AE\u09BE\u09B0_\u09AA\u09CD\u09B0\u09BF\u09AF\u09BC_&
     &\u09AB\u09BE\u0987\u09B2',                  & ! Bengali
     & 'Meu_arquivo_favorito',                    & ! Portuguese
     & '\u041C\u043E\u0439_\u043B\u044E\u0431\u0438\u043C\u044B\u0439_&
     &\u0444\u0430\u0439\u043B',                  & ! Russian
     & '\u0645\u06CC\u0631\u06CC_&
     &\u067E\u0633\u0646\u062F\u06CC\u062F\u06C1_&
     &\u0641\u0627\u0626\u0644.pdf',              & ! Urdu
     & 'src/M_modules.F90', &
     & 'src/subset.inc', &
     & 'test/check.f90 ', &
     & 'app/main.f90 ']
     character(len=*),parameter :: &
       g='(*(g0))', g1='(*(g0,1x))', comma='(*(g0:,", ",/))'

        ! some basic usage
        write(*,g)merge('PASSED','FAILED',glob("mississipPI", "*issip*PI"))
        write(*,g)merge('PASSED','FAILED',glob("bLah", "bL?h"))
        write(*,g)merge('PASSED','FAILED',glob("bLaH", "?LaH"))

        ! create a list of trimmed filenames
        ufiles=unicode_type(filenames)
        ufiles=trim(ufiles)
        write(*,g)'FILENAMES:'
        call show_filenames(ufiles)

        ! create a list of trimmed filenames from encoded names
        ufiles=remove_backslash(encoded)
        ufiles=trim(ufiles)
        write(*,g)'ENCODED FILENAMES:'
        call show_filenames(ufiles)

        ! get filenames ending in ".f90"
        matched=pack(ufiles,glob(ufiles,'*.f90'))
        write(*,g)'MATCHED *.f90:'
        call show_filenames(matched)

        ! get filenames ending in ".c"
        matched=pack(ufiles,glob(ufiles,'*.c'))
        write(*,g)'MATCHED *.c:'
        call show_filenames(matched)

     contains
     subroutine show_filenames(names)
     type(unicode_type),allocatable :: names(:)
        write(*,g1)':SIZE:',size(names),':LEN:',len(names)
        write(*,comma)(names(i)%character(),i=1,size(names))
     end subroutine show_filenames

     end program demo_glob

Expected Output

PASSED
PASSED
PASSED
FILENAMES:
:SIZE: 14 :LEN: 20 9 19 21 13 23 14 20 16 21 17 14 14 12
My_favorite_file.F90, 
我最喜欢的文档.c, 
मरी_पसदीदा_फाइल.f90, 
Mi_archivo_favorito.c, 
ملفي_المفضل.h, 
Mon_fichier_préféré.f90, 
আমার_পরিয_ফাইল, 
Meu_arquivo_favorito, 
Мой_любимый_файл, 
میری_پسندیدہ_فائل.pdf, 
src/M_modules.F90, 
src/subset.inc, 
test/check.f90, 
app/main.f90
ENCODED FILENAMES:
:SIZE: 14 :LEN: 20 9 22 21 13 23 16 20 16 21 17 14 14 12
My_favorite_file.F90, 
我最喜欢的文档.c, 
मेरी_पसंदीदा_फ़ाइल.f90, 
Mi_archivo_favorito.c, 
ملفي_المفضل.h, 
Mon_fichier_préféré.f90, 
আমার_প্রিয়_ফাইল, 
Meu_arquivo_favorito, 
Мой_любимый_файл, 
میری_پسندیدہ_فائل.pdf, 
src/M_modules.F90, 
src/subset.inc, 
test/check.f90, 
app/main.f90
MATCHED *.f90:
:SIZE: 4 :LEN: 22 23 14 12
मेरी_पसंदीदा_फ़ाइल.f90, 
Mon_fichier_préféré.f90, 
test/check.f90, 
app/main.f90
MATCHED *.c:
:SIZE: 2 :LEN: 9 21
我最喜欢的文档.c, 
Mi_archivo_favorito.c
3 Likes

Beta versions of routines that convert HTML character entities to UTF-8
and basic box character support using lines drawn with the pound character
(“#”) were added to M_unicode.

All standard named HTML entities (ie. “&NAME;” strings) are defined that end in a semi-colon (ie. 2125 names):

program new
use,intrinsic :: iso_fortran_env, only : stdin=>input_unit
use M_unicode, only : expand_html, pound_to_box
use M_unicode, only : assignment(=), ut=>unicode_type
implicit none
    open(unit=stdin,pad='yes')
    call test_expand_html()
    call test_pound_to_box()
contains

subroutine test_expand_html()
type(ut) :: u_line
character(len=:),allocatable :: a_line

   a_line='< > & © ® ™ € £ Before After'
   u_line=expand_html(a_line)
   write(*,'(a)')u_line%ch()

   a_line='< > & © ® ™ € £ Before After'
   u_line=expand_html(a_line)
   write(*,'(a)')u_line%ch()

end subroutine test_expand_html

subroutine test_pound_to_box()
integer                      :: i
type(ut),allocatable         :: textout(:)
character(len=*),parameter   :: text(*)=[character(len=108) :: &
'', &
'   #############', &
'   #  A table  #', &
'   #############', &
'   #red  # 255 #', &
'   #green#   0 #', &
'   #blue #   0 #', &
'   #############']
    textout=pound_to_box(text,style='double')
    do i=1,size(textout)
       write(*,'(a)')textout(i)%character()
    enddo
end subroutine test_pound_to_box

end program new
< > & © ® ™ € £ Before After
< > & © ® ™ € £ Before After
                                                                                                            
   ╔═══════════╗                                                                                            
   ║  A table  ║                                                                                            
   ╠═════╦═════╣                                                                                            
   ║red  ║ 255 ║                                                                                            
   ║green║   0 ║                                                                                            
   ║blue ║   0 ║                                                                                            
   ╚═════╩═════╝                                                                                            
4 Likes