使用 Pymongo 从 MongoDB 构建直方图

Question

我正在尝试按以下格式创建 MongoDB 文档的直方图：

{
    "_id":1
    "Properties":[
    {
        "type": "a"
    },
    {
        "type": "d"
    }
    ]
}

{
    "_id":2
    "Properties":[
    {
        "type": "c"
    },
    {
        "type": "a"
    }
    ]
}

{
    "_id":3
    "Properties":[
    {
        "type": "c"
    },
    {
        "type": "d"
    }
    ]
}

这个例子中的输出应该是：

一=2

c = 2

d = 2

我目前的解决方法包括使用以下方法查询整个集合：

collection.find({})

然后使用 python 字典遍历和累积数据。我确信在 MongoDB 查询本身中有更好的方法来执行此操作，我是否可以像我怀疑的那样在单个查询中获取此数据？

请注意，在执行查询之前，我不知道 "types" 我可能会找到哪个。

Answer 1

不确定这是否适合您的情况，但您可以通过属性将它们分开，例如：

count_a = collection.find({'Properties.type':'a'}).count()
count_b = collection.find({'Properties.type':'b'}).count()
count_c = collection.find({'Properties.type':'c'}).count()

如果您不知道类型，您可以创建一个采用不同类型的变量，并且可以简单地执行以下操作：

mistery_type = 'assign the misery type in var when you know it'
mistery_type_count = collection.find({'Properties.type': mistery_type}).count()

Answer 2

在这种情况下，您可以使用MongoDB aggregation。

关于 Aggregation 的更多信息：https://docs.mongodb.org/manual/core/aggregation-introduction/

db.collection.aggregate([
    { $unwind : "$Properties" }, 
    { $group: { _id: "$Properties.type", count: { $sum: 1 } } }
]);

输出：

{
    "result" : [ 
        {
            "_id" : "c",
            "count" : 2.0000000000000000
        }, 
        {
            "_id" : "d",
            "count" : 2.0000000000000000
        }, 
        {
            "_id" : "a",
            "count" : 2.0000000000000000
        }
    ],
    "ok" : 1.0000000000000000
}

在Python中：

from pymongo import MongoClient

if __name__ == '__main__':
    db = MongoClient().test
    pipeline = [
        { "$unwind" : "$Properties" }, 
        { "$group": { "_id": "$Properties.type", "count": { "$sum": 1 } } }
    ]
    print list(db.collection.aggregate(pipeline))

输出：

[{u'count': 2, u'_id': u'c'}, {u'count': 2, u'_id': u'd'}, {u'count': 2, u'_id': u'a'}]

使用 Pymongo 从 MongoDB 构建直方图

Building a histogram from MongoDB with Pymongo

python

histogram

mongodb

pymongo

mongodb-query