基于嵌套文档集的聚合

Aggregation on the basis of the set of nested docs

假设我有接下来的 5 个文档:

{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }

我想操纵这个集合,这样它将 return 一组学生(带有他们的 _id)按他们所修课程的集合(组合)计算每个课程中有多少学生设置.

在上面的例子中,我有 3 组(组合)课程和学生人数如下:

1 - [ "A", "B" ] <- 2 位学生选择了这个组合

2 - [ "A", "B", "C" ] <- 2 名学生

3 - [ "A", "B", "D" ] <- 1 名学生

我觉得这更像是 MapReduce 任务而不是 Aggregation...不确定...

更新 1

非常感谢@ExplosionPills

所以下面的聚合命令:

db.students.aggregate([{
    $group: {
        _id: "$courses",
        count: {$sum: 1},
    students: {$push: "$_id"}
    }
}])

给我以下输出:

{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }

它按课程集分组,计算属于它的学生人数和他们的 _ids。

更新 2

我发现,上面的聚合将组合 [ "C", "A", "B" ] 视为不同于 [ "A", "B", "C" ]。但我需要这 2 个计数相同。

那么让我们看一下以下文件:

{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] }

让我们看看输出:

{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] }
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }

请参阅第 1 行和第 3 行 - 这不是我想要的。

因此,为了将 [ "C", "A", "B" ][ "A", "B", "C" ] 视为相同的组合,我按如下方式更改了聚合:

db.students.aggregate([
    {$unwind: "$courses" },
    {$sort : {"courses": 1}}, 
    {$group: {_id: "$_id", courses: {$push: "$courses"}}}, 
    {$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}}
    ])

输出:

{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] }

这是一个使用分组的聚合操作。

db.students.aggregate([{
    $group: {
        // Uniquely identify the document.
        // The $ syntax queries on this field
        _id: "$courses",

        // Add 1 for each field found (effectively a counter)
        count: {$sum: 1}
    }
}]);

编辑:

如果课程可以按任何顺序排列,您可以按照编辑后的问题中的建议再次 $unwind$sort$group。也可以通过 mapReduce 执行此操作,但我不确定哪个更快。

db.students.mapReduce(
    function () {
        // Use the sorted courses as the key
        emit(this.courses.sort(), this._id);
    },
    function (key, values) {
        return {"students": values, count: values.length};
    },
    {out: {inline: 1}}
)