展开并忽略重复后对元素进行分组 - MongoDB 聚合
Group Elements after unwind and ignore duplication - MongoDB Aggregation
我使用聚合来展开一个数组,并按数组的每个元素分组以求和一些对应的值。
我的 collection 就像:
/* 1 */
{
"_id" : ObjectId("59ce411c2708c97154d1319b"),
"sourceMediumPath" : [
{
"nodeValue" : "(direct) / (none)"
},
{
"nodeValue" : "(direct) / (none)"
}
],
"totalConversions" : 1,
"totalConversionValue" : 171.6,
}
/* 2 */
{
"_id" : ObjectId("59ce411c2708c97154d136a0"),
"sourceMediumPath" : [
{
"nodeValue" : "google / cpc"
},
{
"nodeValue" : "(direct) / (none)"
},
{
"nodeValue" : "google / cpc"
}
],
"totalConversions" : 1,
"totalConversionValue" : 151.8,
}
我想按 sourceMedium.nodeValue 分组并对 'totalConverions' 和 'totalConversionValue' 求和而不考虑重复元素。
例如使用展开、分组和求和:
aggregation = Aggregation.newAggregation(
Aggregation.unwind("sourceMediumPath"),
Aggregation.group("sourceMediumPath.nodeValue")
.sum("totalConversions").as(Variables.TOTAL_CONNVERSIONS)
.sum("TotalConversionValue").as(Variables.TOTAL_CONVERSION_VALUE),
Aggregation.project("sourceMediumPath.nodeValue")
.andInclude(Variables.TOTAL_CONNVERSIONS, Variables.TOTAL_CONVERSION_VALUE)
我得到 "nodeValue" : "(direct) / (none)" TotalConversions 的总和等于 3,"google / cpc" 的总和等于 1。因为它重复执行展开操作时的 totalConversions 和 totalConversions。
有任何解决方案可以忽略重复并且每个文档只有一个值。
我该怎么做?
您可以在 $unwind
阶段前添加 $addField to create an array of unique values using $setIntersection 并展开此字段,如下所示:
db.collection.aggregate( [
{ $addFields: {
sourceMediumUniquePath: { $setIntersection: [
"$sourceMediumPath",
"$sourceMediumPath"
] }
} },
{ $unwind: "$sourceMediumUniquePath" },
... rest of the pipeline ...
])
更新:
如果不支持$addFields
,可以用$project
阶段代替。唯一的缺点是您需要列出以后阶段需要的所有字段。例如:
db.collection.aggregate( [
{ $project: {
sourceMediumUniquePath: { $setIntersection: [
"$sourceMediumPath",
"$sourceMediumPath"
] },
totalConversions: 1,
totalConversionValue: 1
} },
{ $unwind: "$sourceMediumUniquePath" },
... rest of the pipeline ...
])
在Spring数据中,聚合投影管道中的相交数组完成了这项工作,通过应用与相同数组的相交。
aggregation = Aggregation.newAggregation(
Aggregation.project().and("sourceMediumPathnodeValue").intersectsArrays("sourceMediumPathnodeValue").as("Intersection").andInclude(...),
Aggregation.unwind("Intersection"),
Aggregation.group("Intersection")
.sum("totalConversions").as("totalConversions")
.sum("TotalConversionValue").as(TotalConversionValue"),
Aggregation.project("sourceMediumPath.nodeValue")
.andInclude("totalConversions", TotalConversionValue")
new OutOperation("OutputCollection")
).withOptions(Aggregation.newAggregationOptions().allowDiskUse(true).build());
非常感谢您的帮助
我使用聚合来展开一个数组,并按数组的每个元素分组以求和一些对应的值。
我的 collection 就像:
/* 1 */
{
"_id" : ObjectId("59ce411c2708c97154d1319b"),
"sourceMediumPath" : [
{
"nodeValue" : "(direct) / (none)"
},
{
"nodeValue" : "(direct) / (none)"
}
],
"totalConversions" : 1,
"totalConversionValue" : 171.6,
}
/* 2 */
{
"_id" : ObjectId("59ce411c2708c97154d136a0"),
"sourceMediumPath" : [
{
"nodeValue" : "google / cpc"
},
{
"nodeValue" : "(direct) / (none)"
},
{
"nodeValue" : "google / cpc"
}
],
"totalConversions" : 1,
"totalConversionValue" : 151.8,
}
我想按 sourceMedium.nodeValue 分组并对 'totalConverions' 和 'totalConversionValue' 求和而不考虑重复元素。
例如使用展开、分组和求和:
aggregation = Aggregation.newAggregation(
Aggregation.unwind("sourceMediumPath"),
Aggregation.group("sourceMediumPath.nodeValue")
.sum("totalConversions").as(Variables.TOTAL_CONNVERSIONS)
.sum("TotalConversionValue").as(Variables.TOTAL_CONVERSION_VALUE),
Aggregation.project("sourceMediumPath.nodeValue")
.andInclude(Variables.TOTAL_CONNVERSIONS, Variables.TOTAL_CONVERSION_VALUE)
我得到 "nodeValue" : "(direct) / (none)" TotalConversions 的总和等于 3,"google / cpc" 的总和等于 1。因为它重复执行展开操作时的 totalConversions 和 totalConversions。
有任何解决方案可以忽略重复并且每个文档只有一个值。
我该怎么做?
您可以在 $unwind
阶段前添加 $addField to create an array of unique values using $setIntersection 并展开此字段,如下所示:
db.collection.aggregate( [
{ $addFields: {
sourceMediumUniquePath: { $setIntersection: [
"$sourceMediumPath",
"$sourceMediumPath"
] }
} },
{ $unwind: "$sourceMediumUniquePath" },
... rest of the pipeline ...
])
更新:
如果不支持$addFields
,可以用$project
阶段代替。唯一的缺点是您需要列出以后阶段需要的所有字段。例如:
db.collection.aggregate( [
{ $project: {
sourceMediumUniquePath: { $setIntersection: [
"$sourceMediumPath",
"$sourceMediumPath"
] },
totalConversions: 1,
totalConversionValue: 1
} },
{ $unwind: "$sourceMediumUniquePath" },
... rest of the pipeline ...
])
在Spring数据中,聚合投影管道中的相交数组完成了这项工作,通过应用与相同数组的相交。
aggregation = Aggregation.newAggregation(
Aggregation.project().and("sourceMediumPathnodeValue").intersectsArrays("sourceMediumPathnodeValue").as("Intersection").andInclude(...),
Aggregation.unwind("Intersection"),
Aggregation.group("Intersection")
.sum("totalConversions").as("totalConversions")
.sum("TotalConversionValue").as(TotalConversionValue"),
Aggregation.project("sourceMediumPath.nodeValue")
.andInclude("totalConversions", TotalConversionValue")
new OutOperation("OutputCollection")
).withOptions(Aggregation.newAggregationOptions().allowDiskUse(true).build());
非常感谢您的帮助