在 Mongodb 中聚合的有效方法

Efficient way to aggregate in Mongodb

我有一个collection

{
"name" : "foo"
"clicked" : {"0":6723,"1": 1415,"2":1122}
}
{
    "name" : "bar"
    "clicked" : {"8":1423,"9": 1415,"10":1122}
    }
{
"name" : "xyz"
"clicked" : {"22":6723,"23": 1415,"2":1234}
}

点击基本上是{"position of item-clicked in the list" : "id of the item"}

我想要的最终输出是某个项目被点击的总次数,即上面示例的以下内容:

    {
     6723:2, 
     1415:3, 
     1423:1,
     1122:2,
     1234:1
    }

一种方法是在内存中维护一个字典(在 python 脚本中)并在每个文档中查找 "clicked" 字段以更新字典。 我是 mongo 的新手,请帮忙!

使用collections.Counter

In [58]: import pymongo

In [59]: from collections import Counter

In [61]: conn = pymongo.MongoClient()

In [62]: db = conn.test

In [63]: col = db.collection

In [64]: result = col.aggregate([{"$group": {"_id": None, "clicked": {"$push": "$clicked"}}}]).next()['clicked']

In [65]: c = Counter()

In [66]: for el in [Counter(i.values()) for i in result]:
   ....:     c += el
   ....:     

In [67]: print(dict(c))
{1122: 2, 6723: 2, 1415: 3, 1234: 1, 1423: 1}

如果您可以取消当前模式并重新设计它,使点击的是一个以键值对作为其元素的数组,那么您可以应用聚合框架来获得所需的结果。

在 Mongo 中,您可以通过使用 forEach() method of the find() 光标遍历文档并使用键数组更新单击的字段来转换架构-值对对象:

db.collection.find().forEach(function (doc){
    var obj     = {},
        keys    = Object.keys(doc.clicked), 
            clicked = keys.map(function (key){ 
                obj.position = parseInt(key);
                obj.elementId = doc.clicked[key]
                return obj;
            }); 
    doc.clicked = clicked;
    db.collection.save(doc);
});

使用上述方法更改架构后,您的文档将具有以下结构:

{
    "name": "foo",
    "clicked": [
        { "position": 0, "elementId": 6723 },
        { "position": 1, "elementId": 1415 },
        { "position": 2, "elementId": 1122 }
    ]
},
{
    "name": "bar",
    "clicked": [
        { "position": 8, "elementId": 1423 },
        { "position": 9, "elementId": 1415 },
        { "position": 10, "elementId": 1122 }
    ]    
},
{
    "name": "xyz"
    "clicked": [
        { "position": 22, "elementId": 6723 },
        { "position": 23, "elementId": 1415 },
        { "position": 2,  "elementId": 1234 }
    ]
}

通过使用 aggregation framework. This would entail an aggregation pipeline that consists of an $unwind and $group operators, with the $unwind 作为其第一个管道步骤来获得所需的聚合将是一件非常容易的事。这将从输入文档中解构 clicked 数组字段以输出每个元素的文档。每个输出文档用一个元素值替换数组。

每个组的 $group operator groups the input documents by the specified elementId identifier/key and applies the accumulator expression $sum 将给出分组文档的计数:

var pipeline = [
      {
        "$unwind": "$clicked"
      },
      {
        "$group": {
          "_id": "$clicked.elementId",
          "count": {
            "$sum": 1
          }
        }
      }
    ];
    db.collection.aggregate(pipeline)

输出

/* 0 */
{
    "result" : [ 
        {
            "_id" : 1234,
            "count" : 1
        }, 
        {
            "_id" : 1423,
            "count" : 1
        }, 
        {
            "_id" : 1122,
            "count" : 2
        }, 
        {
            "_id" : 1415,
            "count" : 3
        }, 
        {
            "_id" : 6723,
            "count" : 2
        }
    ],
    "ok" : 1
}

将结果转换成你需要的对象只需要聚合游标结果的map()方法:

var result = db.test.aggregate(pipeline)
               .map(function(doc){ return {doc["_id"]: doc["count"]} });
printjson(result);

输出:

[
    {
         6723: 2, 
         1415: 3, 
         1423: 1,
         1122: 2,
         1234: 1
    }
]

我终于能够构建一个 map-reduce 聚合来完成我的工作,而无需更改架构。

var map_function = function(){ 
                      for( x in this.clicked){
                          var key = this.clicked[x]; 
                          emit(key,1);
                          } 
                      };

var reduce_function = function(a,b){
                          return Array.sum(b);
                      };
db.imp.mapReduce( map_function, reduce_function,"id").find()