如何输出 python 中的 utf-8 字符串列表？

Question

嗯，字符编码和解码有时让我很沮丧。

所以我们知道u'\u4f60\u597d'是你好、

的utf-8编码

>>> print hellolist
[u'\u4f60\u597d']
>>> print hellolist[0]
你好

现在我真正想从输出中获取或写入文件的是[u'你好']，但它一直是[u'\u4f60\u597d']，那么你是怎么做到的？

Answer 1

你误会了。

python 中的

u'' 是 而不是 utf-8，它只是 Unicode（Python 中的 Windows 除外<= 3.2，这里是 utf-16。

utf-8是Unicode的一种编码，必然是bytes.

的一个序列

另外，u'你'和u'\u4f60'是完全一样的东西。只是在 Python2 中高字符的 repr 使用转义而不是原始值。

由于 Python2 现在即将停产，您应该开始认真考虑切换到 Python3。在 Python3 中跟踪所有这些要容易得多，因为只有一种字符串类型，而且当您 .encode 和 .decode.

时更清楚

Answer 2

当您打印（或写入文件）列表时，它会在内部调用列表的 str() 方法，但列表会在其元素上内部调用 repr() 方法。 repr() returns 你看到的丑陋的 unicode 表示。

repr 示例 -

>>> h = u'\u4f60\u597d'
>>> print h
\u4f60\u597d
>>> print repr(h)
u'\u4f60\u597d'

您需要手动获取列表的元素并打印它们才能正确打印。

例子-

>>> h1 = [h,u'\u4f77\u587f']
>>> print u'[' + u','.join([u"'" + unicode(i) + u"'" for i in h1]) + u']'

对于包含可能具有 unicode 字符的子列表的列表，您需要一个递归函数，例如 -

>>> h1 = [h,(u'\u4f77\u587f',)]
>>> def listprinter(l):
...     if isinstance(l, list):
...             return u'[' + u','.join([listprinter(i) for i in l]) + u']'
...     elif isinstance(l, tuple):
...             return u'(' + u','.join([listprinter(i) for i in l]) + u')'
...     elif isinstance(l, (str, unicode)):
...             return u"'" + unicode(l) + u"'"
... 
>>> 
>>> 
>>> print listprinter(h1)

要将它们保存到文件，请使用相同的列表理解或递归函数。示例 -

with open('<filename>','w') as f:
    f.write(listprinter(l))

Answer 3

 with open("some_file.txt","wb") as f:
    f.write(hellolist[0].encode("utf8"))

我认为可以解决您的问题

大多数文本编辑器使用 utf8 编码:)

虽然其他答案是正确的 none 他们实际上解决了您的问题

>>> u'\u4f60\u597d'.encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'

如果你想要括号

>>> u'[u\u4f60\u597d,]'.encode("utf8")

Answer 4

一件事是 unicode 字符本身

hellolist = u'\u4f60\'

另一个是如何表示它。

根据要显示的位置，您可以用多种方式表示它。

网站：UTF-8 数据库：可能是 UTF-16 或 UTF-8 日本网页：EUC-JP 或 Shift JIS

例如『本』 http://unicode.org/cgi-bin/GetUnihanData.pl?codepoint=672c http://www.fileformat.info/info/unicode/char/672c/index.htm

如何输出 python 中的 utf-8 字符串列表？

How to output a utf-8 string list as it is in python?

python

utf-8

character-encoding