展开元素后对多维数组进行分组
Group multi-dimensional array after unwinding elements
再次 mongoDB。很喜欢聚合,但是还是不行"get it"。
这是我的数组:
{
"_id" : ObjectId("55951b2bf41edfc80b00002a"),
"orders" : [
{
"id" : "55929142f41edfdc0f00002f",
"name" : "XYZ",
"id_basket" : 1,
"card" : [
{
"id" : "250",
"serial" : "B",
"type" : "9cf4161002b9eda349bb9c5ae64b9f4a",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
}
]
},
{
"id" : "250",
"serial" : "B",
"type" : "9cf4161002b9eda349bb9c5ae64b9f4a",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
}
]
}
],
"full_amount" : "40",
},
{
"id" : "55929142f41edfdc0f00002f",
"name" : "XYZ",
"id_basket" : 1,
"card" : [
{
"id" : "250",
"serial" : "B",
"type" : "9cf4161002b9eda349bb9c5ae64b9f4a",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
}
]
},
{
"id" : "250",
"serial" : "B",
"type" : "9cf4161002b9eda349bb9c5ae64b9f4a",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
}
]
}
],
"full_amount" : "40",
},
],
"rate" : "0.23",
"date" : "2015-07-02 13:04:34",
"id_user" : 97,
}
我想输出这样的东西:
{
"_id" : ObjectId("55951b2bf41edfc80b00002a"),
"orders" : [
{
"id" : "55929142f41edfdc0f00002f",
"name" : "XYZ",
"card" : [
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
}
]
},
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
}
]
}
],
"full_amount" : "40",
},
{
"id" : "55929142f41edfdc0f00002f",
"name" : "XYZ",
"card" : [
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
}
]
},
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
}
]
}
],
"full_amount" : "40",
},
],
"rate" : "0.23",
"date" : "2015-07-02 13:04:34",
}
我尝试了多种展开、投影和分组的组合,但都没有得到我想要的。有人可以帮我解决这个问题吗?
您可能不应该将聚合框架用于这样的任务,这些任务实际上 "aggregate" 文档之间没有任何内容。这确实是一项 "projection" 任务,因为您所要求的只是 "alter" 文档的结构,而该任务可能更适合在检索文档后在客户端中进行编码。
一个很好的理由是像 $unwind
这样的操作在性能方面 非常昂贵 。 $unwind
所做的是为每个存在的数组成员生成文档内容的 "copy",这导致需要处理更多的文档。
可以将其视为具有 "one to many" 关系的 "SQL Join",唯一的区别是数据独立包含在一个文档中。处理 $unwind
模拟 "join" 结果,即为每个 "child"(多)个文档复制 "master"(一个)文档内容。
为了应对人们进行的此类操作,MongoDB 2.6 引入了 $map
运算符,它处理文档本身中的数组元素。
因此,您可以在 $project
阶段使用 $map
处理文档本身中的数组,而不是执行多个(或任何)$unwind
操作:
db.collection.aggregate([
{ "$project": {
"orders": { "$map": {
"input": "$orders",
"as": "o",
"in": {
"id": "$$o.id",
"name": "$$o.name",
"card": { "$map": {
"input": "$$o.card",
"as": "c",
"in": {
"id": "$$c.id",
"serial": "$$c.serial",
"name": "$$c.name",
"ticket": { "$map": {
"input": "$$c.ticket",
"as": "t",
"in": {
"id": "$$t.id",
"name": "$$t.name",
"price": "$$t.price.price_disp"
}
}}
}
}},
"full_amount": "$$o.full_amount"
}
}},
"rate": 1,
"date": 1
}}
])
那里的操作相当简单,因为每个 "array" 都被分配了自己的变量名,对于像这样的简单投影操作,真正剩下的就是选择您想要的字段。
在早期版本中,使用$unwind
处理要困难得多:
db.collection.aggregate([
{ "$unwind": "$orders" },
{ "$unwind": "$orders.card" },
{ "$unwind": "$orders.card.ticket" },
{ "$group": {
"_id": {
"_id": "$_id",
"orders": {
"id": "$orders.id",
"name": "$orders.name",
"card": {
"id": "$orders.card.id",
"serial": "$orders.card.serial",
"name": "$orders.card.name"
},
"full_amount": "$orders.full_amount"
},
"rate": "$rate",
"date": "$date"
},
"ticket": {
"$push": {
"id": "$orders.card.ticket.id",
"name": "$orders.card.ticket.name",
"price": "$orders.card.ticket.price.price_disp"
}
}
}},
{ "$group": {
"_id": {
"_id": "$_id._id",
"orders": {
"id": "$_id.orders.id",
"name": "$_id.orders.name",
"full_amount": "$_id.orders.full_amount"
},
"rate": "$_id.rate",
"date": "$_id.date"
},
"card": {
"$push": {
"id": "$_id.orders.card.id",
"serial": "$_id.orders.card.serial",
"name": "$_id.orders.card.name",
"ticket": "$ticket"
}
}
}},
{ "$group": {
"_id": "$_id._id",
"orders": {
"$push": {
"id": "$_id.orders.id",
"name": "$_id.orders.name",
"card": "$card",
"full_amount": "$_id.orders.full_amount"
}
},
"rate": { "$first": "$_id.rate" },
"date": { "$first": "$_id.date" }
}}
])
因此,仔细阅读之后,您应该看到,由于您 $unwind
三次,因此有必要 $group
"three times" as well, while carefully grouping all the distinct values at each "level" and re-constructing the arrays via $push
。
如前所述,根本不建议这样做:
你"are not grouping/aggregating anything"和每个子文档"must"包含一个"unique" 标识符,因为重新构造数组需要 "grouping" 操作。 (参见:注意)
此处的$unwind
操作成本很高。所有的文档信息都是通过"n"数组X"n"数组元素等的因子重新生成的。因此,聚合管道中的数据比您的集合或查询选择本身实际包含的数据多得多。
因此总而言之,对于 "reformatting your data" 的一般处理,您应该在代码中处理每个文档,而不是 "throwing it" 在聚合管道中进行。
如果您的文档数据需要 "sufficient" 操作,使返回的结果大小 "substantial difference" 您认为比拉取整个文档并在客户端中操作更有效,然后 "only" 那么您是否应该使用 $project
形式,如 $map
操作所示。
边栏
你原来的"tag"这里提到"PHP".
包括聚合在内的所有 MongoDB 查询都没有关于它们的特定语言,只是 "data structures" 并且主要在这些语言的 "native form" 中表示(PHP,JavaScript,python,etc), 以及 "builder methods" 对于那些没有 "native" 自由结构表达格式的语言 (C,C#,Java) .
在所有情况下,JSON 都有简单的解析器可用,这在此处很常见 "linqua franca",因为 MongoB Shell 本身是 Java 基于脚本的并且可以理解JSON 结构(作为实际的 Java 脚本对象)本机。
因此,在处理此类示例时,请使用以下工具:
json_decode:更深入地了解您的原生数据结构是如何构建的。
json_encode:为了根据任何 JSON 代表样本检查您的本机数据结构。
这里的所有内容都是简单的"key/value" array()
表示法,尽管是嵌套的。但了解这些工具并定期使用它们可能是一个好习惯。
注意:
您提供的数据样本看起来很像您有 "cut and paste" 数据来创建多个项目,因为各种 "sub-items" 都共享相同的 "id" 值。
您的"real"数据不应该这样做!所以我希望它不会,但如果是,那就修复它。
为了使第二个示例可行(第一个完全没问题),需要更改数据以包含每个子元素的 "unique" "id" 值。
正如我在这里使用的:
{
"_id" : ObjectId("55951b2bf41edfc80b00002a"),
"orders" : [
{
"id" : "55929142f41edfdc0f00002a",
"name" : "XYZ",
"card" : [
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000031",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000032",
"name" : "ZZZ",
"price" : "10 €"
}
]
},
{
"id" : "251",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000033",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000034",
"name" : "ZZZ",
"price" : "10 €"
}
]
}
],
"full_amount" : "40",
},
{
"id" : "55929142f41edfdc0f00002b",
"name" : "XYZ",
"card" : [
{
"id" : "252",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000035",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000036",
"name" : "ZZZ",
"price" : "10 €"
}
]
},
{
"id" : "253",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000037",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000038",
"name" : "ZZZ",
"price" : "10 €"
}
]
}
],
"full_amount" : "40",
}
],
"rate" : "0.23",
"date" : "2015-07-02 13:04:34",
}
再次 mongoDB。很喜欢聚合,但是还是不行"get it"。
这是我的数组:
{
"_id" : ObjectId("55951b2bf41edfc80b00002a"),
"orders" : [
{
"id" : "55929142f41edfdc0f00002f",
"name" : "XYZ",
"id_basket" : 1,
"card" : [
{
"id" : "250",
"serial" : "B",
"type" : "9cf4161002b9eda349bb9c5ae64b9f4a",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
}
]
},
{
"id" : "250",
"serial" : "B",
"type" : "9cf4161002b9eda349bb9c5ae64b9f4a",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
}
]
}
],
"full_amount" : "40",
},
{
"id" : "55929142f41edfdc0f00002f",
"name" : "XYZ",
"id_basket" : 1,
"card" : [
{
"id" : "250",
"serial" : "B",
"type" : "9cf4161002b9eda349bb9c5ae64b9f4a",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
}
]
},
{
"id" : "250",
"serial" : "B",
"type" : "9cf4161002b9eda349bb9c5ae64b9f4a",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : {
"name" : "Normal",
"price" : "10",
"price_disp" : "10 €",
}
}
]
}
],
"full_amount" : "40",
},
],
"rate" : "0.23",
"date" : "2015-07-02 13:04:34",
"id_user" : 97,
}
我想输出这样的东西:
{
"_id" : ObjectId("55951b2bf41edfc80b00002a"),
"orders" : [
{
"id" : "55929142f41edfdc0f00002f",
"name" : "XYZ",
"card" : [
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
}
]
},
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
}
]
}
],
"full_amount" : "40",
},
{
"id" : "55929142f41edfdc0f00002f",
"name" : "XYZ",
"card" : [
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
}
]
},
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000030",
"name" : "ZZZ",
"price" : "10 €"
}
]
}
],
"full_amount" : "40",
},
],
"rate" : "0.23",
"date" : "2015-07-02 13:04:34",
}
我尝试了多种展开、投影和分组的组合,但都没有得到我想要的。有人可以帮我解决这个问题吗?
您可能不应该将聚合框架用于这样的任务,这些任务实际上 "aggregate" 文档之间没有任何内容。这确实是一项 "projection" 任务,因为您所要求的只是 "alter" 文档的结构,而该任务可能更适合在检索文档后在客户端中进行编码。
一个很好的理由是像 $unwind
这样的操作在性能方面 非常昂贵 。 $unwind
所做的是为每个存在的数组成员生成文档内容的 "copy",这导致需要处理更多的文档。
可以将其视为具有 "one to many" 关系的 "SQL Join",唯一的区别是数据独立包含在一个文档中。处理 $unwind
模拟 "join" 结果,即为每个 "child"(多)个文档复制 "master"(一个)文档内容。
为了应对人们进行的此类操作,MongoDB 2.6 引入了 $map
运算符,它处理文档本身中的数组元素。
因此,您可以在 $project
阶段使用 $map
处理文档本身中的数组,而不是执行多个(或任何)$unwind
操作:
db.collection.aggregate([
{ "$project": {
"orders": { "$map": {
"input": "$orders",
"as": "o",
"in": {
"id": "$$o.id",
"name": "$$o.name",
"card": { "$map": {
"input": "$$o.card",
"as": "c",
"in": {
"id": "$$c.id",
"serial": "$$c.serial",
"name": "$$c.name",
"ticket": { "$map": {
"input": "$$c.ticket",
"as": "t",
"in": {
"id": "$$t.id",
"name": "$$t.name",
"price": "$$t.price.price_disp"
}
}}
}
}},
"full_amount": "$$o.full_amount"
}
}},
"rate": 1,
"date": 1
}}
])
那里的操作相当简单,因为每个 "array" 都被分配了自己的变量名,对于像这样的简单投影操作,真正剩下的就是选择您想要的字段。
在早期版本中,使用$unwind
处理要困难得多:
db.collection.aggregate([
{ "$unwind": "$orders" },
{ "$unwind": "$orders.card" },
{ "$unwind": "$orders.card.ticket" },
{ "$group": {
"_id": {
"_id": "$_id",
"orders": {
"id": "$orders.id",
"name": "$orders.name",
"card": {
"id": "$orders.card.id",
"serial": "$orders.card.serial",
"name": "$orders.card.name"
},
"full_amount": "$orders.full_amount"
},
"rate": "$rate",
"date": "$date"
},
"ticket": {
"$push": {
"id": "$orders.card.ticket.id",
"name": "$orders.card.ticket.name",
"price": "$orders.card.ticket.price.price_disp"
}
}
}},
{ "$group": {
"_id": {
"_id": "$_id._id",
"orders": {
"id": "$_id.orders.id",
"name": "$_id.orders.name",
"full_amount": "$_id.orders.full_amount"
},
"rate": "$_id.rate",
"date": "$_id.date"
},
"card": {
"$push": {
"id": "$_id.orders.card.id",
"serial": "$_id.orders.card.serial",
"name": "$_id.orders.card.name",
"ticket": "$ticket"
}
}
}},
{ "$group": {
"_id": "$_id._id",
"orders": {
"$push": {
"id": "$_id.orders.id",
"name": "$_id.orders.name",
"card": "$card",
"full_amount": "$_id.orders.full_amount"
}
},
"rate": { "$first": "$_id.rate" },
"date": { "$first": "$_id.date" }
}}
])
因此,仔细阅读之后,您应该看到,由于您 $unwind
三次,因此有必要 $group
"three times" as well, while carefully grouping all the distinct values at each "level" and re-constructing the arrays via $push
。
如前所述,根本不建议这样做:
你"are not grouping/aggregating anything"和每个子文档"must"包含一个"unique" 标识符,因为重新构造数组需要 "grouping" 操作。 (参见:注意)
此处的
$unwind
操作成本很高。所有的文档信息都是通过"n"数组X"n"数组元素等的因子重新生成的。因此,聚合管道中的数据比您的集合或查询选择本身实际包含的数据多得多。
因此总而言之,对于 "reformatting your data" 的一般处理,您应该在代码中处理每个文档,而不是 "throwing it" 在聚合管道中进行。
如果您的文档数据需要 "sufficient" 操作,使返回的结果大小 "substantial difference" 您认为比拉取整个文档并在客户端中操作更有效,然后 "only" 那么您是否应该使用 $project
形式,如 $map
操作所示。
边栏
你原来的"tag"这里提到"PHP".
包括聚合在内的所有 MongoDB 查询都没有关于它们的特定语言,只是 "data structures" 并且主要在这些语言的 "native form" 中表示(PHP,JavaScript,python,etc), 以及 "builder methods" 对于那些没有 "native" 自由结构表达格式的语言 (C,C#,Java) .
在所有情况下,JSON 都有简单的解析器可用,这在此处很常见 "linqua franca",因为 MongoB Shell 本身是 Java 基于脚本的并且可以理解JSON 结构(作为实际的 Java 脚本对象)本机。
因此,在处理此类示例时,请使用以下工具:
json_decode:更深入地了解您的原生数据结构是如何构建的。
json_encode:为了根据任何 JSON 代表样本检查您的本机数据结构。
这里的所有内容都是简单的"key/value" array()
表示法,尽管是嵌套的。但了解这些工具并定期使用它们可能是一个好习惯。
注意:
您提供的数据样本看起来很像您有 "cut and paste" 数据来创建多个项目,因为各种 "sub-items" 都共享相同的 "id" 值。
您的"real"数据不应该这样做!所以我希望它不会,但如果是,那就修复它。
为了使第二个示例可行(第一个完全没问题),需要更改数据以包含每个子元素的 "unique" "id" 值。
正如我在这里使用的:
{
"_id" : ObjectId("55951b2bf41edfc80b00002a"),
"orders" : [
{
"id" : "55929142f41edfdc0f00002a",
"name" : "XYZ",
"card" : [
{
"id" : "250",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000031",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000032",
"name" : "ZZZ",
"price" : "10 €"
}
]
},
{
"id" : "251",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000033",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000034",
"name" : "ZZZ",
"price" : "10 €"
}
]
}
],
"full_amount" : "40",
},
{
"id" : "55929142f41edfdc0f00002b",
"name" : "XYZ",
"card" : [
{
"id" : "252",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000035",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000036",
"name" : "ZZZ",
"price" : "10 €"
}
]
},
{
"id" : "253",
"serial" : "B",
"name" : "Eco",
"ticket" : [
{
"id" : "55927d41f41edfd00f000037",
"name" : "ZZZ",
"price" : "10 €"
},
{
"id" : "55927d41f41edfd00f000038",
"name" : "ZZZ",
"price" : "10 €"
}
]
}
],
"full_amount" : "40",
}
],
"rate" : "0.23",
"date" : "2015-07-02 13:04:34",
}