组合 MongoDB 列的未排序搜索

Question

是否可以搜索由两列组成的虚拟列？

假设我有以下 MongoDB 集合：

db.collection = 
[
    { book : 'The Stand',   author : 'Stephen King'},
    { book : 'The Dead Zone',   author : 'Stephen King'},
    { book : 'Hamlet',   author : 'William Shakespeare'},
    { book : 'The Tragedy of Othello',   author : 'William Shakespeare'},
    { book : 'Danse Macabre',   author : 'Stephen King'},
]

而且我想进行一个应该同时考虑 book 和 author 列的搜索。特别是，我将有一个由空格分隔的多个项目的查询字符串，我想 return 联合 book+author 列包含所有查询项目的文档，而不考虑它们的顺序。

示例：

Query: "King The"

{ book : 'The Stand',   author : 'Stephen King'},
{ book : 'The Dead Zone',   author : 'Stephen King'}

Query: "Tragedy Shakespeare"

{ book : 'The Tragedy of Othello',   author : 'William Shakespeare'}

Query: "The"

{ book : 'The Stand',   author : 'Stephen King'},
{ book : 'The Dead Zone',   author : 'Stephen King'},
{ book : 'The Tragedy of Othello',   author : 'William Shakespeare'},

在MongoDB中可以进行这种搜索吗？有什么$regex表达式可以使它可行吗？

谢谢！

Answer 1

这是我认为可能有帮助的聚合...

db.collection.aggregate([
    { $project: { book: 1, author: 1, "book_words": { $split: [ "$book", " " ] }, "author_words": { $split: [ "$author", " " ] } } },
    { $project: { book:1, author: 1, "search_words": { $concatArrays: [ "$book_words", "$author_words" ] } } },
    { $match: { "search_words": { $all: [ "The", "King" ] } } },
    { $project: { "search_words": 0} }
]).pretty()

解释：

此聚合有 4 个阶段...

$项目
$项目
$匹配
$项目

第一个 $project 将字段“book”中的字符串值拆分为名为“book_words”的单词数组，并将字段“author”中的字符串值拆分为单词数组称为“author_words”

第二个 $project 会将两个新数组连接成一个名为“search_words”的数组

$match 阶段过滤掉不符合搜索条件的记录

最后的 $project 阶段删除名为“search_words”的临时数组字段

此聚合的结果文档类似于...

{
    "_id" : ObjectId("60d6139a9148371ae7d2b343"),
    "book" : "The Stand",
    "author" : "Stephen King"
}
{
    "_id" : ObjectId("60d6139a9148371ae7d2b344"),
    "book" : "The Dead Zone",
    "author" : "Stephen King"
}

不区分大小写的匹配

为了提供不区分大小写的匹配MongoDB 必须理解不区分大小写的含义。英语的大小写不同于其他语言。因此，出于这个原因，我们必须添加一个索引，其排序规则将英语定义为语言，排序规则的强度为 2 - 这意味着英语不区分大小写。创建索引后，我们必须在聚合中将排序规则作为选项引用。

创建索引

db.collection.createIndex( { book: 1, author: 1 }, { collation: { locale: 'en', strength: 2 } } )

这是两个字段的复合索引 - 'book' 和 'author'。注意这个索引的整理选项...

使用排序规则聚合

既然存在具有特定排序规则的索引，Mongo 现在可以计算不区分大小写的选项...

db.collection.aggregate([
    { $project: { book: 1, author: 1, "book_words": { $split: [ "$book", " " ] }, "author_words": { $split: [ "$author", " " ] } } },
    { $project: { book:1, author: 1, "search_words": { $concatArrays: [ "$book_words", "$author_words" ] } } },
    { $match: { "search_words": { $all: [ "the", "king" ] } } },
    { $project: { "search_words": 0} }
],
{ collation: { locale: "en", strength: 2 } }).pretty()

请注意，排序规则选项已应用于聚合。此外，聚合 $match 阶段现在使用所有小写文本。

这是输出...

{
    "_id" : ObjectId("60d6139a9148371ae7d2b343"),
    "book" : "The Stand",
    "author" : "Stephen King"
}
{
    "_id" : ObjectId("60d6139a9148371ae7d2b344"),
    "book" : "The Dead Zone",
    "author" : "Stephen King"
}

小心

使用带有排序规则选项的正则表达式可能不会按预期工作，至少从索引策略的角度来看是这样。在我的示例中，我没有使用任何正则表达式 ($regex)，因此它按预期工作。但同样，这是针对完全匹配，而不是部分匹配（a.k.a。范围查询），例如“以 'ki*' 开头”

Mongo数据库图集搜索

如果使用 MongoDB Atlas，使用 Atlas Search 可以直接解决这个问题，除了 'the' 等常用词被省略。

组合 MongoDB 列的未排序搜索

Unsorted search of combined MongoDB columns

regex

mongodb

pymongo

不区分大小写的匹配