从字符串中提取单个 unicode 字符

Question

当我偶然发现 unicode 字符时，问题就开始了。例如，arbol。现在我通过询问 i 位置的字符是否小于 127 来处理这个问题。这意味着它属于 ASCII table，我知道确保 string (i:i) 是一个 完整的单个字符 。在另一种情况下 (>= 127) 和我的示例 'árbol'，string (1,2) 是 完整字符 .

我认为我处理字符串的方式解决了我的实际目的问题（处理西班牙语、波兰语和俄语的文件），但是在处理中文字母的情况下，字符可能最多需要 4 个字节我会有问题。

fortran 中有没有办法在字符串中挑出 unicode 字符？

Answer 1

gfortran 目前不支持 UTF-8 编码文件中的非 ASCII 字符，请参阅 here. You can find the corresponding bug report here。

作为变通方法，您可以使用十六进制表示法指定 unicode 字符：char(int(z'00E1'), ucs4) 或 '\u00E1'。后者需要编译选项 -fbackslash 来启用反斜杠的计算。

program character_kind
  use iso_fortran_env
  implicit none
  integer, parameter :: ucs4  = selected_char_kind ('ISO_10646')

  character(kind=ucs4,  len=20) :: string

!  string = ucs4_'árbol' ! This does not work
!  string = char(int(z'00E1'), ucs4) // ucs4_'rbol' ! This works
  string = ucs4_'\u00E1rbol' ! This is also working

  open (output_unit, encoding='UTF-8')

  print *, string(1:1)
  print *, string

end program character_kind

ifort好像根本不支持ISO_10646，selected_char_kind ('ISO_10646')returns-1。使用 ifort 15.0.0，我得到与 here 描述相同的消息。

从字符串中提取单个 unicode 字符

Extract single unicode character from string

string

unicode

fortran

character-encoding