Unicode 与 Unicode 表示

Question

对于 unicode 字符和该字符的表示之间的区别，我有点困惑。两者有什么区别：

>>> u'é'
>>> u'\xe9'

此外，我如何才能在 'readable' unicode (é) 和机器 unicode(\xe9) 之间来回切换？我将如何针对以下情况执行此操作？

>>> u='bj\u00f6rk: voltaic'
>>> print u
bj\u00f6rk: voltaic

Answer 1

您想使用 repr:

print repr(u)

shell 中的

print u'é' 为您提供 str 输出，shell 中的 u'é' 显示 repr 输出。

str

Return a string containing a nicely printable representation of an object. For strings, this returns the string itself. The difference with repr(object) is that str(object) does not always attempt to return a string that is acceptable to eval(); its goal is to return a printable string. If no argument is given, returns the empty string, ''.

Answer 2

repr() 仅适用于 python 2（这显然是您正在使用的）。这是一个更完整的答案和一个建议：如果您将使用 unicode，请尽可能切换到 python 3。 没有充分的理由尝试使当 python 3 中的改进是它存在的主要原因之一时，了解 python 2 如何处理文本编码。

What's the difference between the two:
u'é' 
u'\xe9' 

这是指定相同字符串的两种方式python；这两个字符串的内容没有区别：

>>> u'é' == u'\xe9'
True

就像值为 62 的字节在 ascii 中是 A (chr(62) == "A"), unicode characters have a (usually) 16-bit value that is conventionally written in hexadecimal. For a single-character, you could display this value in decimal withord()`（虽然这不是很有用）。

当您打印字符串时，python 会尝试显示它。当您使用 repr() 转换它时，python 将向您显示字符本身或转义形式，具体取决于它是否可以在文字字符串中使用。在 python 3 中，repr('é') 只是 'é'，因为它可以出现在字符串文字中。要将您的文本视为一系列 16 位 unicode 代码点，您应该这样做（适用于 python 2 或 3）：

>>> text = u'e'
>>> print(text.encode("raw_unicode_escape"))
b'\xe9'

这是一个更长的例子（在 python 3）：

>>> text = "neé"
>>> repr(text)
"'neé'"
>>> text.encode("raw_unicode_escape")
b'ne\xe9'
>>> text.encode("utf-8")
b'ne\xc3\xa9'

请注意，UTF-8 将 é 编码为两字节序列。 UTF-8只是一种编码，这个字符的codepoint是U+00e9.

Unicode 与 Unicode 表示

Unicode vs Unicode representation

python

unicode