子文档中行的 Mongoose 聚合“$sum”

Question

我相当擅长 sql 查询，但我似乎无法理解分组和获取 mongo 数据库文档的总和，

考虑到，我有一个具有如下架构的工作模型：

    {
        name: {
            type: String,
            required: true
        },
        info: String,
        active: {
            type: Boolean,
            default: true
        },
        all_service: [

            price: {
                type: Number,
                min: 0,
                required: true
            },
            all_sub_item: [{
                name: String,
                price:{ // << -- this is the price I want to calculate
                    type: Number,
                    min: 0
                },
                owner: {
                    user_id: {  //  <<-- here is the filter I want to put
                        type: Schema.Types.ObjectId,
                        required: true
                    },
                    name: String,
                    ...
                }
            }]

        ],
        date_create: {
            type: Date,
            default : Date.now
        },
        date_update: {
            type: Date,
            default : Date.now
        }
    }

我想要 price 列的总和，其中存在 owner，我在下面尝试但没有成功

 Job.aggregate(
        [
            {
                $group: {
                    _id: {}, // not sure what to put here
                    amount: { $sum: '$all_service.all_sub_item.price' }
                },
                $match: {'not sure how to limit the user': given_user_id}
            }
        ],
        //{ $project: { _id: 1, expense: 1 }}, // you can only project fields from 'group'
        function(err, summary) {
            console.log(err);
            console.log(summary);
        }
    );

有人可以指导我正确的方向吗？提前谢谢你

Answer 1

听起来你想SQL等效地做"sum (prices) WHERE owner IS NOT NULL"。

根据该假设，您需要先进行 $match，以将输入集缩减为您的总和。所以你的第一阶段应该是这样的

$match: { all_service.all_sub_items.owner : { $exists: true } }

将此视为将所有匹配文档传递到第二阶段。

现在，因为您要对一个数组求和，所以您必须执行另一个步骤。聚合运算符在 documents 上工作——实际上没有办法对数组求和。因此，我们想扩展您的数组，以便将数组中的每个元素都拉出，以在其自己的文档中将数组字段表示为一个值。将此视为交叉连接。这将是 $unwind.

$unwind: { "$all_service.all_sub_items" }

现在您刚刚制作了更多的文档，但我们可以将它们加在一起。现在我们可以执行 $group。在您的 $group 中，您指定一个转换。该行：

_id: {}, // not sure what to put here

正在输出文档中创建一个字段，它与输入文档不同。因此，您可以在此处随意设置 _id，但请将其视为等同于 sql 中的 "GROUP BY"。 $sum 运算符实质上将为您在此处创建的与该 _id 匹配的每组文档创建总和 - 因此实质上我们将 "re-collapsing" 您刚刚使用 $group 对 $unwind 所做的操作。但这将允许 $sum 工作。

我认为您正在寻找仅基于主文档 ID 的分组，因此我认为您问题中的 $sum 语句是正确的。

$group : { _id : $_id, totalAmount : { $sum : '$all_service.all_sub_item.price' } }

这将输出文档，其 _id 字段等于您的原始文档 ID 和您的总和。

我让你把它放在一起，我对节点不是很熟悉。你很接近，但我认为将你的 $match 移到前面并使用 $unwind 阶段将使你到达你需要的位置。祝你好运！

Answer 2

入门

如前所述，将聚合 "pipeline" 视为 Unix 和其他系统 shell 中的 "pipe" | 运算符确实有帮助。一个 "stage" 将输入馈送到 "next" 阶段，依此类推。

这里你需要注意的是你有 "nested" 个数组，一个数组在另一个数组中，如果你不小心，这可能会对你的预期结果产生巨大的影响。

您的文档在顶层包含一个 "all_service" 数组。据推测，这里经常有 "multiple" 个条目，所有条目都包含您的 "price" 属性以及 "all_sub_item"。那么当然 "all_sub_item" 本身就是一个数组，也包含它自己的许多项。

您可以将这些数组视为 SQL 中表格之间的 "relations"，在每种情况下都是 "one-to-many"。但是数据是 "pre-joined" 形式，您可以在其中一次获取所有数据而无需执行连接。这么多你应该已经熟悉了。

但是，当您想要 "aggregate" 跨文档时，您需要 "de-normalize" 这与 SQL 通过 "defining" [=111] =].这是为了"transform"数据进入适合聚合的非规范化状态。

所以同样的可视化适用。主文档的条目根据子文档的数量进行复制，"join" 到 "inner-child" 将相应地复制主文档和初始 "child"。在"nutshell"中，这个：

{
    "a": 1,
    "b": [
        { 
            "c": 1,
            "d": [
                { "e": 1 }, { "e": 2 }
            ]
        },
        { 
            "c": 2,
            "d": [
                { "e": 1 }, { "e": 2 }
            ]
        }
    ]
}

变成这样：

{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 1, "d" : { "e" : 2 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 1 } } }
{ "a" : 1, "b" : { "c" : 2, "d" : { "e" : 2 } } }

而执行此操作的操作是 $unwind，并且由于有多个数组，因此在继续任何处理之前，您需要 $unwind 两个数组：

db.collection.aggregate([
    { "$unwind": "$b" },
    { "$unwind": "$b.d" }
])

所以有 "pipe" 来自 "$b" 的第一个数组，如下所示：

{ "a" : 1, "b" : { "c" : 1, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }
{ "a" : 1, "b" : { "c" : 2, "d" : [ { "e" : 1 }, { "e" : 2 } ] } }

留下由“$b.d”引用的第二个数组进一步去规范化为最终的去规范化结果"without any arrays"。这允许其他操作进行处理。

求解

使用大约 "every" 聚合管道，"first" 您要做的是 "filter" 仅将文档 "filter" 包含您的结果的文档。这是一个好主意，尤其是在执行 $unwind 等操作时，您不希望在甚至不匹配目标数据的文档上执行此操作。

所以你需要在数组深度匹配你的"user_id"。但这只是获得结果的一部分，因为您应该知道在文档中查询数组中的匹配值时会发生什么。

当然，"whole"文件还是会返回的，因为这才是你真正要的。数据已经是 "joined" 并且我们没有要求 "un-join" 它在任何 way.You 中看起来就像 "first" 文档选择一样，但是当 "de-normalized"，每个数组元素现在实际上代表一个 "document" 本身。

所以不是"only"你$match开头的"pipeline"，你也$match处理完"all"$unwind 语句，向下到您希望匹配的元素的级别。

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // De-normalize arrays
        { "$unwind": "$all_service" },
        { "$unwind": "$all_service.all_subitem" },

        // Match again to filter the array elements
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_service.all_sub_item.price" }
        }}

    ],
    function(err,results) {

    }
)

或者，自 2.6 以来的现代 MongoDB 版本也支持 $redact 运算符。在这种情况下，这可以用于 "pre-filter" 在使用 $unwind:

处理之前的数组内容

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Filter arrays for matches in document
        { "$redact": {
            "$cond": {
                "if": { 
                    "$eq": [ 
                        { "$ifNull": [ "$owner", given_user_id ] },
                        given_user_id
                    ]
                },
                "then": "$$DESCEND",
                "else": "$$PRUNE"
            }
        }},

        // De-normalize arrays
        { "$unwind": "$all_service" },
        { "$unwind": "$all_service.all_subitem" },

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_service.all_sub_item.price" }
        }}

    ],
    function(err,results) {

    }
)

这可以 "recursively" 遍历文档并测试条件，有效地删除甚至 $unwind 之前的任何 "un-matched" 数组元素。这可以加快速度，因为不匹配的项目不需要 "un-wound"。然而，有一个 "catch"，如果由于某种原因 "owner" 根本不存在于数组元素中，那么此处所需的逻辑会将其视为另一个 "match"。你总是可以再次 $match 来确定，但还有一个更有效的方法：

Job.aggregate(
    [
        // Match to filter possible "documents"
        { "$match": { 
            "all_service.all_sub_item.owner": given_user_id
        }},

        // Filter arrays for matches in document
        { "$project": {
            "all_items": {
              "$setDifference": [
                { "$map": {
                  "input": "$all_service",
                  "as": "A",
                  "in": {
                    "$setDifference": [
                      { "$map": {
                        "input": "$$A.all_sub_item",
                        "as": "B",
                        "in": {
                          "$cond": {
                            "if": { "$eq": [ "$$B.owner", given_user_id ] },
                            "then": "$$B",
                            "else": false
                          }
                        }
                      }},
                      false
                    ]          
                  }
                }},
                [[]]
              ]
            }
        }},


        // De-normalize the "two" level array. "Double" $unwind
        { "$unwind": "$all_items" },
        { "$unwind": "$all_items" },

        // Group on the "_id" for the "key" you want, or "null" for all
        { "$group": {
            "_id": null,
            "total": { "$sum": "$all_items.price" }
        }}

    ],
    function(err,results) {

    }
)

与 $redact 相比，该过程减少了两个数组中项目的大小 "drastically"。 $map 运算符将数组的每个元素处理到 "in" 中的给定语句。在这种情况下，每个 "outer" 数组元素被发送到另一个 $map 以处理 "inner" 元素。

此处使用 $cond 执行逻辑测试，如果满足 "condiition" 则返回 "inner" 数组元素，否则返回 false 值。

$setDifference 用于过滤返回的任何 false 值。或者在 "outer" 的情况下，所有 false 值产生的任何 "blank" 数组从 "inner" 中过滤，那里没有匹配项。这只留下匹配项，封装在 "double" 数组中，例如：

[[{ "_id": 1, "price": 1, "owner": "b" },{..}],[{..},{..}]]

因为 "all" 数组元素默认有一个 _id 与猫鼬（这是你保留它的一个很好的理由）那么每个项目都是 "distinct" 并且不受"set" 运算符，除了删除不匹配的值。

处理$unwind"twice"将这些转化为自己文档中的普通对象，适合聚合。

所以这些就是您需要知道的事情。正如我之前所说，"aware" 数据如何 "de-normalizes" 以及这对您的最终总数意味着什么。

子文档中行的 Mongoose 聚合“$sum”

Mongoose aggregation "$sum" of rows in sub document

mongoose

mongodb

node.js

mongodb-query

aggregation-framework

入门

求解