如何在不使用转储的情况下在 python 中写入 json 文件

Question

我有来自 MongoDB 的以下 bson 数据。我必须将代码转换为有效的 json 才能创建 PySpark DataFrame。

"\"{u'_raja': ObjectId('XXXXXX'),\n u'ram': datetime.datetime(XXx,xx14, xx, xx, xxx),\n u'createUserId': u'praja-policy',\n u'raja': u'I5',\n u'udatedTime': datetime.datetime(XXx, xx, xx, xx, xx, xx, xxxx),\n u'lastupdatedid': u'raja_id',\n u'plt': u'123r32'}\""

我写了下面的代码。

from bson import json_util
with open("/XXXXX6/bi/XXXXX/XXXXX3/v0/test/bson.json", "rb") as f:
bson = f.read()
data= bson.replace('u\'','') – removal of Unicode 
data1 = data.replace('\n','') – removal of \n
json.dump(json_util.dumps(data), open("bson1.json", "w"))

使用 json.dump 给我有效的 json 但是，格式为“\”。

如何提取unicode里面的值？所以，我可以创建一个 PySpark DataFrame。

Answer 1

在json.dumps中使用ensure_ascii=False:

bson = f.read()
json.dumps(bson, ensure_ascii=False).encode('utf8')

这将避免 unicode 输出。编码功能可用于编码为您想要的格式。大多数时候 'utf8'

你会很安全

如何在不使用转储的情况下在 python 中写入 json 文件

how to write a json file in python without using dumps

python

json

dataframe

bson

pyspark