Python,将 mongodump 的 bson 输出转换为 json 对象数组(字典)
Python, Convert bson output of mongodump to array of json objects (dictionaries)
我使用 mongodump
命令转储了一个 mongodb 集合。输出是一个包含这些文件的转储目录:
dump/
|___coll.bson
|___coll.metadata.json
如何将导出的文件打开到一组适用于 python 的词典中?
我尝试了以下并且 none 成功了:
with open('dump/coll.bson', 'rb') as f:
coll_raw = f.read()
import json
coll = json.loads(coll_raw)
# Using pymongo
from bson.json_util import loads
coll = loads(coll_raw)
ValueError: No JSON object could be decoded
你应该试试:
from bson import BSON
with open('dump/coll.bson', 'rb') as f:
coll_raw = f.read()
coll = bson.decode_all(coll_raw)
我知道很久以前就有人回答过这个问题,但您可以尝试分别解码每个文档,然后您就会知道是哪个文档导致了问题。
我使用这个库:https://github.com/bauman/python-bson-streaming
from bsonstream import KeyValueBSONInput
f = open("restaurants.bson", 'rb')
stream = KeyValueBSONInput(fh=f)
for dict_data in stream:
print dict_data
f.close()
我看到 25359 条记录,它们似乎都解码为类似的内容:
{u'_id': ObjectId('5671bb2e111bb7b9a7ce4d9a'),
u'address': {u'building': u'351',
u'coord': [-73.98513559999999, 40.7676919],
u'street': u'West 57 Street',
u'zipcode': u'10019'},
u'borough': u'Manhattan',
u'cuisine': u'Irish',
u'grades': [{u'date': datetime.datetime(2014, 9, 6, 0, 0),
u'grade': u'A',
u'score': 2},
{u'date': datetime.datetime(2013, 7, 22, 0, 0),
u'grade': u'A',
u'score': 11},
{u'date': datetime.datetime(2012, 7, 31, 0, 0),
u'grade': u'A',
u'score': 12},
{u'date': datetime.datetime(2011, 12, 29, 0, 0),
u'grade': u'A',
u'score': 12}],
u'name': u'Dj Reynolds Pub And Restaurant',
u'restaurant_id': u'30191841'}
我使用 mongodump
命令转储了一个 mongodb 集合。输出是一个包含这些文件的转储目录:
dump/
|___coll.bson
|___coll.metadata.json
如何将导出的文件打开到一组适用于 python 的词典中? 我尝试了以下并且 none 成功了:
with open('dump/coll.bson', 'rb') as f:
coll_raw = f.read()
import json
coll = json.loads(coll_raw)
# Using pymongo
from bson.json_util import loads
coll = loads(coll_raw)
ValueError: No JSON object could be decoded
你应该试试:
from bson import BSON
with open('dump/coll.bson', 'rb') as f:
coll_raw = f.read()
coll = bson.decode_all(coll_raw)
我知道很久以前就有人回答过这个问题,但您可以尝试分别解码每个文档,然后您就会知道是哪个文档导致了问题。
我使用这个库:https://github.com/bauman/python-bson-streaming
from bsonstream import KeyValueBSONInput
f = open("restaurants.bson", 'rb')
stream = KeyValueBSONInput(fh=f)
for dict_data in stream:
print dict_data
f.close()
我看到 25359 条记录,它们似乎都解码为类似的内容:
{u'_id': ObjectId('5671bb2e111bb7b9a7ce4d9a'),
u'address': {u'building': u'351',
u'coord': [-73.98513559999999, 40.7676919],
u'street': u'West 57 Street',
u'zipcode': u'10019'},
u'borough': u'Manhattan',
u'cuisine': u'Irish',
u'grades': [{u'date': datetime.datetime(2014, 9, 6, 0, 0),
u'grade': u'A',
u'score': 2},
{u'date': datetime.datetime(2013, 7, 22, 0, 0),
u'grade': u'A',
u'score': 11},
{u'date': datetime.datetime(2012, 7, 31, 0, 0),
u'grade': u'A',
u'score': 12},
{u'date': datetime.datetime(2011, 12, 29, 0, 0),
u'grade': u'A',
u'score': 12}],
u'name': u'Dj Reynolds Pub And Restaurant',
u'restaurant_id': u'30191841'}