使用 Pymongo 根据更新条件插入许多
Insert many based on upserting condition with Pymongo
我正在尝试找到一种有效的方法来将 Pandas DataFrame 上传到 MongoDB 集合,但有以下限制:
如果文档已经存在,基于2个独特的文档特征(即'business_id'和'document_key'),覆盖
如果文档不存在,基于相同的2个唯一文档特征(即'business_id'和'document_key'),创建一个
我试过:
from pymongo import UpdateOne
upserts=[
UpdateOne(
{"$and": [
{'business_id': x['business_id']},
{"document_key": x["document_key"]}
]
},
{'$setOnInsert': x},
upsert=True
)
for x in dd.to_dict("records")
]
result = collection.bulk_write(upserts)
但它似乎没有更新文档,也不符合上述 overwriting/new 文档创建策略。
如何根据图示的 2 个要点进行插入?
我怀疑你想要 $set
而不是 $setOnInsert
If an update operation with upsert: true results in an insert of a
document, then $setOnInsert assigns the specified values to the fields
in the document. If the update operation does not result in an insert,
$setOnInsert does nothing.
https://docs.mongodb.com/manual/reference/operator/update/setOnInsert/
使用 $set 的工作示例:
import pandas as pd
from pymongo import MongoClient, UpdateOne
db = MongoClient()['mydatabase']
collection = db['mycollection']
collection.insert_many([{'business_id': x, 'document_key': x, 'Existing': True} for x in range(10)])
df = pd.DataFrame([{'business_id': x, 'document_key': x, 'Updated': True} for x in range(3, 6)])
upserts = [
UpdateOne(
{'business_id': x['business_id'],
"document_key": x["document_key"]},
{'$set': x},
upsert=True
)
for x in df.to_dict("records")
]
result = collection.bulk_write(upserts)
print(f'Matched: {result.matched_count}, Upserted: {result.upserted_count}, Modified: {result.modified_count}')
for document in collection.find({}, {'_id': 0}):
print(document)
打印:
Matched: 3, Upserted: 0, Modified: 3
{'business_id': 0, 'document_key': 0, 'Existing': True}
{'business_id': 1, 'document_key': 1, 'Existing': True}
{'business_id': 2, 'document_key': 2, 'Existing': True}
{'business_id': 3, 'document_key': 3, 'Existing': True, 'Updated': True}
{'business_id': 4, 'document_key': 4, 'Existing': True, 'Updated': True}
{'business_id': 5, 'document_key': 5, 'Existing': True, 'Updated': True}
{'business_id': 6, 'document_key': 6, 'Existing': True}
{'business_id': 7, 'document_key': 7, 'Existing': True}
{'business_id': 8, 'document_key': 8, 'Existing': True}
{'business_id': 9, 'document_key': 9, 'Existing': True}
我正在尝试找到一种有效的方法来将 Pandas DataFrame 上传到 MongoDB 集合,但有以下限制:
如果文档已经存在,基于2个独特的文档特征(即'business_id'和'document_key'),覆盖
如果文档不存在,基于相同的2个唯一文档特征(即'business_id'和'document_key'),创建一个
我试过:
from pymongo import UpdateOne
upserts=[
UpdateOne(
{"$and": [
{'business_id': x['business_id']},
{"document_key": x["document_key"]}
]
},
{'$setOnInsert': x},
upsert=True
)
for x in dd.to_dict("records")
]
result = collection.bulk_write(upserts)
但它似乎没有更新文档,也不符合上述 overwriting/new 文档创建策略。
如何根据图示的 2 个要点进行插入?
我怀疑你想要 $set
而不是 $setOnInsert
If an update operation with upsert: true results in an insert of a document, then $setOnInsert assigns the specified values to the fields in the document. If the update operation does not result in an insert, $setOnInsert does nothing.
https://docs.mongodb.com/manual/reference/operator/update/setOnInsert/
使用 $set 的工作示例:
import pandas as pd
from pymongo import MongoClient, UpdateOne
db = MongoClient()['mydatabase']
collection = db['mycollection']
collection.insert_many([{'business_id': x, 'document_key': x, 'Existing': True} for x in range(10)])
df = pd.DataFrame([{'business_id': x, 'document_key': x, 'Updated': True} for x in range(3, 6)])
upserts = [
UpdateOne(
{'business_id': x['business_id'],
"document_key": x["document_key"]},
{'$set': x},
upsert=True
)
for x in df.to_dict("records")
]
result = collection.bulk_write(upserts)
print(f'Matched: {result.matched_count}, Upserted: {result.upserted_count}, Modified: {result.modified_count}')
for document in collection.find({}, {'_id': 0}):
print(document)
打印:
Matched: 3, Upserted: 0, Modified: 3
{'business_id': 0, 'document_key': 0, 'Existing': True}
{'business_id': 1, 'document_key': 1, 'Existing': True}
{'business_id': 2, 'document_key': 2, 'Existing': True}
{'business_id': 3, 'document_key': 3, 'Existing': True, 'Updated': True}
{'business_id': 4, 'document_key': 4, 'Existing': True, 'Updated': True}
{'business_id': 5, 'document_key': 5, 'Existing': True, 'Updated': True}
{'business_id': 6, 'document_key': 6, 'Existing': True}
{'business_id': 7, 'document_key': 7, 'Existing': True}
{'business_id': 8, 'document_key': 8, 'Existing': True}
{'business_id': 9, 'document_key': 9, 'Existing': True}