基于嵌套文档集的聚合
Aggregation on the basis of the set of nested docs
假设我有接下来的 5 个文档:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
我想操纵这个集合,这样它将 return 一组学生(带有他们的 _id)按他们所修课程的集合(组合)计算每个课程中有多少学生设置.
在上面的例子中,我有 3 组(组合)课程和学生人数如下:
1 - [ "A", "B" ]
<- 2 位学生选择了这个组合
2 - [ "A", "B", "C" ]
<- 2 名学生
3 - [ "A", "B", "D" ]
<- 1 名学生
我觉得这更像是 MapReduce
任务而不是 Aggregation
...不确定...
更新 1
非常感谢@ExplosionPills
所以下面的聚合命令:
db.students.aggregate([{
$group: {
_id: "$courses",
count: {$sum: 1},
students: {$push: "$_id"}
}
}])
给我以下输出:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
它按课程集分组,计算属于它的学生人数和他们的 _id
s。
更新 2
我发现,上面的聚合将组合 [ "C", "A", "B" ]
视为不同于 [ "A", "B", "C" ]
。但我需要这 2 个计数相同。
那么让我们看一下以下文件:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] }
让我们看看输出:
{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] }
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
请参阅第 1 行和第 3 行 - 这不是我想要的。
因此,为了将 [ "C", "A", "B" ]
和 [ "A", "B", "C" ]
视为相同的组合,我按如下方式更改了聚合:
db.students.aggregate([
{$unwind: "$courses" },
{$sort : {"courses": 1}},
{$group: {_id: "$_id", courses: {$push: "$courses"}}},
{$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}}
])
输出:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] }
这是一个使用分组的聚合操作。
db.students.aggregate([{
$group: {
// Uniquely identify the document.
// The $ syntax queries on this field
_id: "$courses",
// Add 1 for each field found (effectively a counter)
count: {$sum: 1}
}
}]);
编辑:
如果课程可以按任何顺序排列,您可以按照编辑后的问题中的建议再次 $unwind
、$sort
和 $group
。也可以通过 mapReduce
执行此操作,但我不确定哪个更快。
db.students.mapReduce(
function () {
// Use the sorted courses as the key
emit(this.courses.sort(), this._id);
},
function (key, values) {
return {"students": values, count: values.length};
},
{out: {inline: 1}}
)
假设我有接下来的 5 个文档:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
我想操纵这个集合,这样它将 return 一组学生(带有他们的 _id)按他们所修课程的集合(组合)计算每个课程中有多少学生设置.
在上面的例子中,我有 3 组(组合)课程和学生人数如下:
1 - [ "A", "B" ]
<- 2 位学生选择了这个组合
2 - [ "A", "B", "C" ]
<- 2 名学生
3 - [ "A", "B", "D" ]
<- 1 名学生
我觉得这更像是 MapReduce
任务而不是 Aggregation
...不确定...
更新 1
非常感谢@ExplosionPills
所以下面的聚合命令:
db.students.aggregate([{
$group: {
_id: "$courses",
count: {$sum: 1},
students: {$push: "$_id"}
}
}])
给我以下输出:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
它按课程集分组,计算属于它的学生人数和他们的 _id
s。
更新 2
我发现,上面的聚合将组合 [ "C", "A", "B" ]
视为不同于 [ "A", "B", "C" ]
。但我需要这 2 个计数相同。
那么让我们看一下以下文件:
{ "_id" : "1", "student" : "Oscar", "courses" : [ "A", "B" ] }
{ "_id" : "2", "student" : "Alan", "courses" : [ "A", "B", "C" ] }
{ "_id" : "3", "student" : "Kate", "courses" : [ "A", "B", "D" ] }
{ "_id" : "4", "student" : "John", "courses" : [ "A", "B", "C" ] }
{ "_id" : "5", "student" : "Bema", "courses" : [ "A", "B" ] }
{ "_id" : "6", "student" : "Alex", "courses" : [ "C", "A", "B" ] }
让我们看看输出:
{ "_id" : [ "C", "A", "B" ], "count" : 1, "students" : [ "6" ] }
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 2, "students" : [ "2", "4" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "1", "5" ] }
请参阅第 1 行和第 3 行 - 这不是我想要的。
因此,为了将 [ "C", "A", "B" ]
和 [ "A", "B", "C" ]
视为相同的组合,我按如下方式更改了聚合:
db.students.aggregate([
{$unwind: "$courses" },
{$sort : {"courses": 1}},
{$group: {_id: "$_id", courses: {$push: "$courses"}}},
{$group: {_id: "$courses", count: {$sum:1}, students: {$push: "$_id"}}}
])
输出:
{ "_id" : [ "A", "B", "D" ], "count" : 1, "students" : [ "3" ] }
{ "_id" : [ "A", "B" ], "count" : 2, "students" : [ "5", "1" ] }
{ "_id" : [ "A", "B", "C" ], "count" : 3, "students" : [ "6", "4", "2" ] }
这是一个使用分组的聚合操作。
db.students.aggregate([{
$group: {
// Uniquely identify the document.
// The $ syntax queries on this field
_id: "$courses",
// Add 1 for each field found (effectively a counter)
count: {$sum: 1}
}
}]);
编辑:
如果课程可以按任何顺序排列,您可以按照编辑后的问题中的建议再次 $unwind
、$sort
和 $group
。也可以通过 mapReduce
执行此操作,但我不确定哪个更快。
db.students.mapReduce(
function () {
// Use the sorted courses as the key
emit(this.courses.sort(), this._id);
},
function (key, values) {
return {"students": values, count: values.length};
},
{out: {inline: 1}}
)