Python3 字符映射到 <undefined>(MINGW64,Windows 10)?

Python3 character maps to <undefined> (MINGW64, Windows 10)?

我尝试使用我在 https://superuser.com/questions/876572/how-do-i-find-out-which-font-contains-a-certain-special-character/1452828 上找到的代码,在 Windows 10 机器上的 MINGW64 Python3 上:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import unicodedata
import os
from fontTools.ttLib import TTFont

fonts = []

for root,dirs,files in os.walk("c:/Windows/Fonts/"):
  for file in files:
    if file.endswith(".ttf"):
      tfile = os.path.join(root,file)
      fonts.append(tfile)

def char_in_font(unicode_char, font):
  for cmap in font['cmap'].tables:
    if cmap.isUnicode():
      if ord(unicode_char) in cmap.cmap:
        return True
  return False

def test(char):
  for fontpath in fonts:
    font = TTFont(fontpath)   # specify the path to the font in question
    if char_in_font(char, font):
      #print(char + " "+ unicodedata.name(char) + " in " + fontpath) # UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>
      #print( "{} ({}) in {}".format(char, unicodedata.name(char), fontpath ) ) # UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>
      print( "({}) in {}".format( unicodedata.name(char), fontpath ) )

test(u"")
test(u"")

如果您 运行 按原样编写代码,您会发现它可以正常工作,因为它输出的内容如下:

$ python3 /tmp/test-font.py
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-Bold.ttf
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-BoldOblique.ttf
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-Oblique.ttf
...

...但是,如果您在注释的打印件上取消注释,则代码将失败并显示:

$ python3 /tmp/test-font.py
Traceback (most recent call last):
  File "C:/msys64/tmp/test-font.py", line 31, in <module>
    test(u"\U0001f63a")
  File "C:/msys64/tmp/test-font.py", line 29, in test
    print( "{} ({}) in {}".format(char, unicodedata.name(char), fontpath ) )
  File "C:/msys64/mingw64/lib/python3.8/encodings/cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>

这对我来说很奇怪,因为 char 是输入变量 - 它显然在系统字体中正确找到 - 然而,它无法在终端中打印 ?!?!

在这种情况下,有谁知道如何让 char 在终端中打印?

MingGW 的控制台正在将字符串转换为控制台编码(cp1252 来自错误消息`)并且该编码不支持所有 Unicode 字符。

标准 Windows 控制台不会收到错误。下面是来自 cmd.exe Windows 控制台的剪切和粘贴。

Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\U0001f63a')

这是实际显示的屏幕截图。字体不支持该字符,控制台显示替换字符字形,但该字符是正确的,如上面相同文本的剪切和粘贴所证明的那样。有两个是因为字符需要两个 UTF-16 编码单元来编码: