Mongo 聚合查询返回的重复数据删除结果

Deduping results returned by a Mongo aggregate query

一些背景:

这涉及到3个合集:

  1. posts
  2. post子类别
  3. post超类别


posts 中的文档示例:

{
    "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"),
    "__v" : 6,
    "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"),
    "postSubCategories" : [ 
        ObjectId("5806344baa0bbf284a2316e4")//reference to document in postsubcategories collection
    ],
    "postSuperCategories" : [ 
        ObjectId("580679958a5f5f448ba5aae9"), 
        ObjectId("580679958a5f5f448ba5aaf2")//references to documents in postsupercategories collection
    ],
    "publishedDate" : ISODate("2016-10-10T04:00:00.000Z"),
    "state" : "published",
    "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"),
    "title" : "My title",
    "topics" : []}

我的查询是

db.posts.aggregate([
{'$unwind': 
    {'path':"$postSubCategories"}
},
{'$lookup': {
  'from':"postsubcategories",
  'localField': "postSubCategories",
  'foreignField': "_id",
  'as': "subObject"
}},
{'$unwind': 
    {'path':"$postSuperCategories"}
},
{'$lookup': {
  'from':"postsupercategories",
  'localField': "postSuperCategories",
  'foreignField': "_id",
  'as': "superObject"
}},
{'$match': {
    '$or':
        [{ "subObject.searchKeywords": "home monitor" }, 
        { "superObject.searchKeywords": "home monitor" }]
    }
},
{'$match': {
    "state": "published"
}}


post子类别和post超类别集合都包含一个名为 searchKeywords 的字段,它是文档中的文本数组。我希望能够查询那些 searchKeywords 字段和 return 匹配的 posts 文档。我需要一组去重的 returned _ids。

查询是 returning 四个文档。示例:

ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")
ObjectId("57fbf40b7ccbc906ed87cef7") 


我明白为什么要 returning 4. 一个文档包含 postSubCategories 对象 5806344baa0bbf284a2316e4 和 postSuperCategories id 580679958a5f5f448ba5aae9.

第二个文档包含 postSubCategories 对象 5806344baa0bbf284a2316e4 和 postSuperCategories 580679958a5f5f448ba5aaf2。这对第二个 post

重复

有没有一种方法可以 "dedupe" 基于 returned 的 _id?

我的最终结果是:

ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")

我知道从技术上讲,4 个列表中的 2 个匹配的 _id 并不完全相同,因为它们各自包含不同的 postSuperCategories 对象,但此时我不再关心该字段只需要一个 posts 文档,因为我需要访问其他字段。

如有任何帮助,我们将不胜感激。我已经尝试调查 $group$addToSet$setUnion,但到目前为止都没有成功。

您可以添加一个 $group 检索不同的 _id 以及为每个要提取的每个属性找到的第一个值 _id

对于 $group 聚合:

{
    '$group': {
        _id: '$_id',
        item: { $first: "$$ROOT" } 
    }
}

这将为您提供 item 字段中 root document 的第一项:

{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "items" : { "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-12-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef4"), "title" : "My title2", "topics" : [ "a", "b" ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "items" : { "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }

否则,选择响应中的字段:

{
    '$group': {
        _id: '$_id',
        author: {
            $first: '$author'
        },
        publishedDate: {
            $first: '$publishedDate'
        },
        state: {
            $first: '$state'
        },
        templateName: {
            $first: '$templateName'
        },
        title: {
            $first: '$title'
        },
        topics: {
            $first: '$topics'
        }
    }
}

你会得到类似的东西:

{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }