展开并忽略重复后对元素进行分组 - MongoDB 聚合

Group Elements after unwind and ignore duplication - MongoDB Aggregation

我使用聚合来展开一个数组,并按数组的每个元素分组以求和一些对应的值。

我的 collection 就像:

/* 1 */
{
    "_id" : ObjectId("59ce411c2708c97154d1319b"),
    "sourceMediumPath" : [ 
        {
            "nodeValue" : "(direct) / (none)"
        }, 
        {
            "nodeValue" : "(direct) / (none)"
        }
    ],
    "totalConversions" : 1,
    "totalConversionValue" : 171.6,
}

/* 2 */
{
    "_id" : ObjectId("59ce411c2708c97154d136a0"),
    "sourceMediumPath" : [ 
        {
            "nodeValue" : "google / cpc"
        }, 
        {
            "nodeValue" : "(direct) / (none)"
        }, 
        {
            "nodeValue" : "google / cpc"
        }
    ],
    "totalConversions" : 1,
    "totalConversionValue" : 151.8,
}

我想按 sourceMedium.nodeValue 分组并对 'totalConverions' 和 'totalConversionValue' 求和而不考虑重复元素。

例如使用展开、分组和求和:

aggregation = Aggregation.newAggregation(
                    Aggregation.unwind("sourceMediumPath"),
                    Aggregation.group("sourceMediumPath.nodeValue")
                            .sum("totalConversions").as(Variables.TOTAL_CONNVERSIONS)
                            .sum("TotalConversionValue").as(Variables.TOTAL_CONVERSION_VALUE),

                    Aggregation.project("sourceMediumPath.nodeValue")
                            .andInclude(Variables.TOTAL_CONNVERSIONS, Variables.TOTAL_CONVERSION_VALUE)

我得到 "nodeValue" : "(direct) / (none)" TotalConversions 的总和等于 3,"google / cpc" 的总和等于 1。因为它重复执行展开操作时的 totalConversions 和 totalConversions。

有任何解决方案可以忽略重复并且每个文档只有一个值。

我该怎么做?

您可以在 $unwind 阶段前添加 $addField to create an array of unique values using $setIntersection 并展开此字段,如下所示:

db.collection.aggregate( [
    { $addFields: {
        sourceMediumUniquePath: { $setIntersection: [ 
            "$sourceMediumPath", 
            "$sourceMediumPath" 
        ] }
    } },
    { $unwind: "$sourceMediumUniquePath" },
    ... rest of the pipeline ...
])

更新:

如果不支持$addFields,可以用$project阶段代替。唯一的缺点是您需要列出以后阶段需要的所有字段。例如:

db.collection.aggregate( [
    { $project: {
        sourceMediumUniquePath: { $setIntersection: [ 
            "$sourceMediumPath", 
            "$sourceMediumPath" 
        ] },
        totalConversions: 1,
        totalConversionValue: 1
    } },
    { $unwind: "$sourceMediumUniquePath" },
    ... rest of the pipeline ...
])

在Spring数据中,聚合投影管道中的相交数组完成了这项工作,通过应用与相同数组的相交。

aggregation = Aggregation.newAggregation(
               Aggregation.project().and("sourceMediumPathnodeValue").intersectsArrays("sourceMediumPathnodeValue").as("Intersection").andInclude(...),
                    Aggregation.unwind("Intersection"),
                    Aggregation.group("Intersection")
                            .sum("totalConversions").as("totalConversions")
                            .sum("TotalConversionValue").as(TotalConversionValue"),
                    Aggregation.project("sourceMediumPath.nodeValue")
                            .andInclude("totalConversions", TotalConversionValue")

                        new OutOperation("OutputCollection")
                ).withOptions(Aggregation.newAggregationOptions().allowDiskUse(true).build());

非常感谢您的帮助