使 json.dumps 在 Python 中正确输出 unicode 字符

Question

我是 Python 的新手，我正在尝试对 UTF8 字符串进行编码。使用 PHP 的 json_encode()，… (U+2026) 变为 \u2026。但是，使用 Python 的 json.dumps()，它变成了 \u00e2\u20ac\u00a6。如何将其转换为 Python 中的 \u2026？

完整程序如下：

import nltk
import json

file=open('pos_tag.txt','r')
tags=nltk.pos_tag(nltk.word_tokenize(file.read()))

print(json.dumps(tags,separators=(',',':')))

Answer 1

问题出在file.open()。我能够使用编解码器模块修复它：

import nltk
import json
import codecs

file=codecs.open('pos_tag.txt','r','utf-8')
tags=nltk.pos_tag(nltk.word_tokenize(file.read()))

print(json.dumps(tags,separators=(',',':')))

使 json.dumps 在 Python 中正确输出 unicode 字符

Make json.dumps output unicode characters properly in Python

python

unicode

python-unicode