如何使用Python将unicode字符串转换为真正的字符串

Question

我已经使用 Python 通过 urllib2 获取了一些信息，但这些信息是 unicode 字符串。

我已经尝试过类似下面的方法：

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print unicode(a).encode("gb2312")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.encode("utf-8").decode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print u""+a

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).decode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).encode("utf-8")

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.decode("utf-8").encode("gb2312")

但所有结果都是一样的：

\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728

而我想获取以下中文文本：

方法，删除存储在

Answer 1

您需要将 string 转换为 unicode string。

首先，a中的反斜杠自动转义：

a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"

print a # Prints \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728

a       # Prints '\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728'

所以玩这个转义字符串的编码/解码没有区别。

您可以使用 unicode literal 或将字符串转换为 unicode string。

要使用unicode literal，只需在字符串前面加一个u：

a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"

要将现有 string 转换为 unicode string，您可以调用 unicode，将 unicode_escape 作为 encoding 参数：

print unicode(a, encoding='unicode_escape') # Prints 方法，删除存储在

我敢打赌您会从 JSON 响应中获得 string，因此第二种方法可能就是您所需要的。

顺便说一句，unicode_escape 编码是一种 Python 特定编码，用于

Produce a string that is suitable as Unicode literal in Python source code

Answer 2

您从哪里获得这些数据？或许你可以分享一下你下载和解压它的方法。

无论如何，它看起来像是一些 JSON 编码字符串的残余？基于这个假设，这里有一个非常 hacky（但并不完全严肃）的方法：

>>> a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
>>> a
'\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728'
>>> s = '"{}"'.format(a)
>>> s
'"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"'
>>> import json
>>> json.loads(s)
u'\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728'
>>> print json.loads(s)
方法，删除存储在

这涉及通过将 a 中的给定字符串用双引号括起来，然后将 JSON 字符串解码为 Python unicode 字符串来重新创建有效的 JSON 编码字符串.

如何使用Python将unicode字符串转换为真正的字符串

How to use Python convert a unicode string to the real string

python

unicode

json

web-crawler