Mongo 聚合查询返回的重复数据删除结果
Deduping results returned by a Mongo aggregate query
一些背景:
这涉及到3个合集:
- posts
- post子类别
- post超类别
posts 中的文档示例:
{
"_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"),
"__v" : 6,
"author" : ObjectId("57fbe2ac3cfb9e061df86ebb"),
"postSubCategories" : [
ObjectId("5806344baa0bbf284a2316e4")//reference to document in postsubcategories collection
],
"postSuperCategories" : [
ObjectId("580679958a5f5f448ba5aae9"),
ObjectId("580679958a5f5f448ba5aaf2")//references to documents in postsupercategories collection
],
"publishedDate" : ISODate("2016-10-10T04:00:00.000Z"),
"state" : "published",
"templateName" : ObjectId("57fbf3977ccbc906ed87cef3"),
"title" : "My title",
"topics" : []}
我的查询是
db.posts.aggregate([
{'$unwind':
{'path':"$postSubCategories"}
},
{'$lookup': {
'from':"postsubcategories",
'localField': "postSubCategories",
'foreignField': "_id",
'as': "subObject"
}},
{'$unwind':
{'path':"$postSuperCategories"}
},
{'$lookup': {
'from':"postsupercategories",
'localField': "postSuperCategories",
'foreignField': "_id",
'as': "superObject"
}},
{'$match': {
'$or':
[{ "subObject.searchKeywords": "home monitor" },
{ "superObject.searchKeywords": "home monitor" }]
}
},
{'$match': {
"state": "published"
}}
post子类别和post超类别集合都包含一个名为 searchKeywords 的字段,它是文档中的文本数组。我希望能够查询那些 searchKeywords 字段和 return 匹配的 posts 文档。我需要一组去重的 returned _ids。
查询是 returning 四个文档。示例:
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")
ObjectId("57fbf40b7ccbc906ed87cef7")
我明白为什么要 returning 4. 一个文档包含 postSubCategories 对象 5806344baa0bbf284a2316e4
和 postSuperCategories id 580679958a5f5f448ba5aae9
.
第二个文档包含 postSubCategories 对象 5806344baa0bbf284a2316e4
和 postSuperCategories 580679958a5f5f448ba5aaf2
。这对第二个 post
重复
有没有一种方法可以 "dedupe" 基于 returned 的 _id?
我的最终结果是:
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")
我知道从技术上讲,4 个列表中的 2 个匹配的 _id 并不完全相同,因为它们各自包含不同的 postSuperCategories 对象,但此时我不再关心该字段只需要一个 posts 文档,因为我需要访问其他字段。
如有任何帮助,我们将不胜感激。我已经尝试调查 $group
、$addToSet
和 $setUnion
,但到目前为止都没有成功。
您可以添加一个 $group
检索不同的 _id
以及为每个要提取的每个属性找到的第一个值 _id
。
对于 $group
聚合:
{
'$group': {
_id: '$_id',
item: { $first: "$$ROOT" }
}
}
这将为您提供 item
字段中 root document 的第一项:
{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "items" : { "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-12-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef4"), "title" : "My title2", "topics" : [ "a", "b" ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "items" : { "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
否则,选择响应中的字段:
{
'$group': {
_id: '$_id',
author: {
$first: '$author'
},
publishedDate: {
$first: '$publishedDate'
},
state: {
$first: '$state'
},
templateName: {
$first: '$templateName'
},
title: {
$first: '$title'
},
topics: {
$first: '$topics'
}
}
}
你会得到类似的东西:
{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }
一些背景:
这涉及到3个合集:
- posts
- post子类别
- post超类别
posts 中的文档示例:
{
"_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"),
"__v" : 6,
"author" : ObjectId("57fbe2ac3cfb9e061df86ebb"),
"postSubCategories" : [
ObjectId("5806344baa0bbf284a2316e4")//reference to document in postsubcategories collection
],
"postSuperCategories" : [
ObjectId("580679958a5f5f448ba5aae9"),
ObjectId("580679958a5f5f448ba5aaf2")//references to documents in postsupercategories collection
],
"publishedDate" : ISODate("2016-10-10T04:00:00.000Z"),
"state" : "published",
"templateName" : ObjectId("57fbf3977ccbc906ed87cef3"),
"title" : "My title",
"topics" : []}
我的查询是
db.posts.aggregate([
{'$unwind':
{'path':"$postSubCategories"}
},
{'$lookup': {
'from':"postsubcategories",
'localField': "postSubCategories",
'foreignField': "_id",
'as': "subObject"
}},
{'$unwind':
{'path':"$postSuperCategories"}
},
{'$lookup': {
'from':"postsupercategories",
'localField': "postSuperCategories",
'foreignField': "_id",
'as': "superObject"
}},
{'$match': {
'$or':
[{ "subObject.searchKeywords": "home monitor" },
{ "superObject.searchKeywords": "home monitor" }]
}
},
{'$match': {
"state": "published"
}}
post子类别和post超类别集合都包含一个名为 searchKeywords 的字段,它是文档中的文本数组。我希望能够查询那些 searchKeywords 字段和 return 匹配的 posts 文档。我需要一组去重的 returned _ids。
查询是 returning 四个文档。示例:
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")
ObjectId("57fbf40b7ccbc906ed87cef7")
我明白为什么要 returning 4. 一个文档包含 postSubCategories 对象 5806344baa0bbf284a2316e4
和 postSuperCategories id 580679958a5f5f448ba5aae9
.
第二个文档包含 postSubCategories 对象 5806344baa0bbf284a2316e4
和 postSuperCategories 580679958a5f5f448ba5aaf2
。这对第二个 post
有没有一种方法可以 "dedupe" 基于 returned 的 _id?
我的最终结果是:ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")
我知道从技术上讲,4 个列表中的 2 个匹配的 _id 并不完全相同,因为它们各自包含不同的 postSuperCategories 对象,但此时我不再关心该字段只需要一个 posts 文档,因为我需要访问其他字段。
如有任何帮助,我们将不胜感激。我已经尝试调查 $group
、$addToSet
和 $setUnion
,但到目前为止都没有成功。
您可以添加一个 $group
检索不同的 _id
以及为每个要提取的每个属性找到的第一个值 _id
。
对于 $group
聚合:
{
'$group': {
_id: '$_id',
item: { $first: "$$ROOT" }
}
}
这将为您提供 item
字段中 root document 的第一项:
{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "items" : { "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-12-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef4"), "title" : "My title2", "topics" : [ "a", "b" ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "items" : { "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
否则,选择响应中的字段:
{
'$group': {
_id: '$_id',
author: {
$first: '$author'
},
publishedDate: {
$first: '$publishedDate'
},
state: {
$first: '$state'
},
templateName: {
$first: '$templateName'
},
title: {
$first: '$title'
},
topics: {
$first: '$topics'
}
}
}
你会得到类似的东西:
{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }