Python3 字符映射到 <undefined>(MINGW64,Windows 10)?
Python3 character maps to <undefined> (MINGW64, Windows 10)?
我尝试使用我在 https://superuser.com/questions/876572/how-do-i-find-out-which-font-contains-a-certain-special-character/1452828 上找到的代码,在 Windows 10 机器上的 MINGW64 Python3 上:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import unicodedata
import os
from fontTools.ttLib import TTFont
fonts = []
for root,dirs,files in os.walk("c:/Windows/Fonts/"):
for file in files:
if file.endswith(".ttf"):
tfile = os.path.join(root,file)
fonts.append(tfile)
def char_in_font(unicode_char, font):
for cmap in font['cmap'].tables:
if cmap.isUnicode():
if ord(unicode_char) in cmap.cmap:
return True
return False
def test(char):
for fontpath in fonts:
font = TTFont(fontpath) # specify the path to the font in question
if char_in_font(char, font):
#print(char + " "+ unicodedata.name(char) + " in " + fontpath) # UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>
#print( "{} ({}) in {}".format(char, unicodedata.name(char), fontpath ) ) # UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>
print( "({}) in {}".format( unicodedata.name(char), fontpath ) )
test(u"")
test(u"")
如果您 运行 按原样编写代码,您会发现它可以正常工作,因为它输出的内容如下:
$ python3 /tmp/test-font.py
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-Bold.ttf
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-BoldOblique.ttf
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-Oblique.ttf
...
...但是,如果您在注释的打印件上取消注释,则代码将失败并显示:
$ python3 /tmp/test-font.py
Traceback (most recent call last):
File "C:/msys64/tmp/test-font.py", line 31, in <module>
test(u"\U0001f63a")
File "C:/msys64/tmp/test-font.py", line 29, in test
print( "{} ({}) in {}".format(char, unicodedata.name(char), fontpath ) )
File "C:/msys64/mingw64/lib/python3.8/encodings/cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>
这对我来说很奇怪,因为 char
是输入变量 - 它显然在系统字体中正确找到 - 然而,它无法在终端中打印 ?!?!
在这种情况下,有谁知道如何让 char
在终端中打印?
MingGW 的控制台正在将字符串转换为控制台编码(cp1252
来自错误消息`)并且该编码不支持所有 Unicode 字符。
标准 Windows 控制台不会收到错误。下面是来自 cmd.exe
Windows 控制台的剪切和粘贴。
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\U0001f63a')
这是实际显示的屏幕截图。字体不支持该字符,控制台显示替换字符字形,但该字符是正确的,如上面相同文本的剪切和粘贴所证明的那样。有两个是因为字符需要两个 UTF-16 编码单元来编码:
我尝试使用我在 https://superuser.com/questions/876572/how-do-i-find-out-which-font-contains-a-certain-special-character/1452828 上找到的代码,在 Windows 10 机器上的 MINGW64 Python3 上:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import unicodedata
import os
from fontTools.ttLib import TTFont
fonts = []
for root,dirs,files in os.walk("c:/Windows/Fonts/"):
for file in files:
if file.endswith(".ttf"):
tfile = os.path.join(root,file)
fonts.append(tfile)
def char_in_font(unicode_char, font):
for cmap in font['cmap'].tables:
if cmap.isUnicode():
if ord(unicode_char) in cmap.cmap:
return True
return False
def test(char):
for fontpath in fonts:
font = TTFont(fontpath) # specify the path to the font in question
if char_in_font(char, font):
#print(char + " "+ unicodedata.name(char) + " in " + fontpath) # UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>
#print( "{} ({}) in {}".format(char, unicodedata.name(char), fontpath ) ) # UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>
print( "({}) in {}".format( unicodedata.name(char), fontpath ) )
test(u"")
test(u"")
如果您 运行 按原样编写代码,您会发现它可以正常工作,因为它输出的内容如下:
$ python3 /tmp/test-font.py
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-Bold.ttf
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-BoldOblique.ttf
(SMILING CAT FACE WITH OPEN MOUTH) in c:/Windows/Fonts/DejaVuSans-Oblique.ttf
...
...但是,如果您在注释的打印件上取消注释,则代码将失败并显示:
$ python3 /tmp/test-font.py
Traceback (most recent call last):
File "C:/msys64/tmp/test-font.py", line 31, in <module>
test(u"\U0001f63a")
File "C:/msys64/tmp/test-font.py", line 29, in test
print( "{} ({}) in {}".format(char, unicodedata.name(char), fontpath ) )
File "C:/msys64/mingw64/lib/python3.8/encodings/cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f63a' in position 0: character maps to <undefined>
这对我来说很奇怪,因为 char
是输入变量 - 它显然在系统字体中正确找到 - 然而,它无法在终端中打印 ?!?!
在这种情况下,有谁知道如何让 char
在终端中打印?
MingGW 的控制台正在将字符串转换为控制台编码(cp1252
来自错误消息`)并且该编码不支持所有 Unicode 字符。
标准 Windows 控制台不会收到错误。下面是来自 cmd.exe
Windows 控制台的剪切和粘贴。
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\U0001f63a')
这是实际显示的屏幕截图。字体不支持该字符,控制台显示替换字符字形,但该字符是正确的,如上面相同文本的剪切和粘贴所证明的那样。有两个是因为字符需要两个 UTF-16 编码单元来编码: