将变量传递给 MongoDB 查询

Question

我的collections有以下文件

{
  cust_id: "0044234",
  Address: "1234 Dunn Hill",
  city: "Pittsburg",
  comments : "4"
},

{
  cust_id: "0097314",
  Address: "5678 Dunn Hill",
  city: "San Diego",
  comments : "99"
},

{
  cust_id: "012345",
  Address: "2929 Dunn Hill",
  city: "Pittsburg",
  comments : "41"
}

我想编写一段代码来提取和存储来自同一城市的所有 cust_id。我可以通过运行以下关于 MongoDB 的查询得到答案：

db.custData.find({"city" : 'Pittsburg'},{business_id:1}).

但是，我无法使用 Python 执行相同的操作。以下是我尝试过的方法：

ctgrp=[{"$group":{"_id":"$city","number of cust":{"$sum":1}}}]
myDict={}
for line in collection.aggregate(ctgrp) : #for grouping all the cities in   the dataset
    myDict[line['_id']]=line['number of cust']
for key in myDict:
    k=db.collection.find({"city" : 'key'},{'cust_id:1'})
    print k
client.close()

此外，我不知道如何存储它。我唯一想到的是一本字典，其中 'list of values' 对应于特定的 'key'。但是，我无法想出关于 same.I 正在寻找这样的输出

的实现

对于匹兹堡，这些值为 0044234 和 012345。

Answer 1

您可以使用 .distinct 方法，这是执行此操作的最佳方法。

import pymongo
client = pymongo.MongoClient()
db = client.test
collection = db.collection

然后：

collection.distinct('cust_id', {'city': 'Pittsburg'})

产量：

['0044234', '012345']

或者做这个效率不高的客户端：

>>> cust_ids = set()
>>> for element in collection.find({'city': 'Pittsburg'}):
...     cust_ids.add(element['cust_id'])
... 
>>> cust_ids
{'0044234', '012345'}

现在，如果您想要给定城市的所有 "cust_id"，这里是

 >>> list(collection.aggregate([{'$match': {'city': 'Pittsburg'} }, {'$group': {'_id': None, 'cust_ids': {'$push': '$cust_id'}}}]))[0]['cust_ids']
['0044234', '012345']

现在，如果您想要按城市对文档进行分组，然后在此处找到不同的 "cust_id"，那么就在这里：

>>> from pprint import pprint
>>> pipeline = [{'$group': {'_id': '$city', 'cust_ids': {'$addToSet': '$cust_id'}, 'count': {'$sum': 1}}}]
>>> pprint(list(collection.aggregate(pipeline)))
[{'_id': 'San Diego', 'count': 1, 'cust_ids': ['0097314']},
 {'_id': 'Pittsburg', 'count': 2, 'cust_ids': ['012345', '0044234']}]

将变量传递给 MongoDB 查询

Passing variables onto a MongoDB Query

python

mongodb

pymongo

mongodb-query

aggregation-framework