通过_id快速添加字段到large MongoDB collection

Question

我有一个 MongoDB collection 像：

[{'_id': abc, 'Sex': 'f'}, {'_id': bcd, 'Sex': 'm'}, {'_id': cde, 'Sex': 'm'}, {'_id': def, 'Sex': 'm'}]

我还有一个 Python 词典列表，例如：

[{'_id': abc, 'Age': 70}, {'_id': bcd, 'Age': 51}, {'_id': cde}, {'_id': def, 'Age': 'unknown'}]

我需要在大型 collection 上按 _id 进行匹配，更新每个文档，例如，如下所示：

[{'_id': abc, 'Sex': 'f', 'Age': 70}, {'_id': bcd, 'Sex': 'm', 'Age': 51}, {'_id': cde, 'Sex': 'm'}, {'_id': def, 'Sex': 'm', 'Age': 'unknown'}]

有没有一种方法可以有效地 处理大型 collection？（不仅仅是遍历字典列表并在每个文档上使用 update_one。）

Answer 1

Is there a way to do this efficiently for a large collection?

您可以执行 Bulk Write Operations 而不是为每个文档发送一个更新操作。

如果您使用的是 PyMongo，那么它会根据 MongoDB 接受的最大消息大小自动将批量更新操作拆分为更小的子批。

例如，您可以循环遍历字典以构建 UpdateOne 写入对象，并构建一个包含一次 10000 次更新的 Unordered Bulk Write Operations。

 requests = [
     UpdateOne({'_id': 'abc'}, {'$set': {'Age': 70}}),
     UpdateOne({'_id': 'bcd'}, {'$set': {'Age': 51}}),
 ]
 try:
     db.test.bulk_write(requests, ordered=False)
 except BulkWriteError as bwe:
     pprint(bwe.details)

请注意，无序的批量写入操作是批处理的，并以任意顺序发送到服务器，它们可以并行执行。尝试所有操作后报告发生的任何错误。

通过_id快速添加字段到large MongoDB collection

Quickly add field to large MongoDB collection by _id

python

mongodb

pymongo

aggregation-framework