为什么 $nin 比 $in 慢，Mongodb

Question

我收集了 500 万个具有正确索引的文档。$ 工作完美，但与 $nin 的相同查询超级慢...这是什么原因？

超级快：

{'tech': {'$in': ['Wordpress', 'wordpress', 'WORDPRESS']}}

超级慢..

{'tech': {'$nin': ['Wordpress', 'wordpress', 'WORDPRESS']}}

Answer 1

当你有索引时（无论你谈论 MongoDB 还是任何其他数据库），搜索某个值总是比搜索 non-existing 值更快。

数据库必须扫描整个索引，当您查找“不在”或“不等于”时，通常甚至没有使用索引。使用 explain()

查看执行计划

一些数据库（例如 Oracle）提供所谓的位图索引。它们的工作方式不同，通常 IN 操作与 NOT IN 操作一样快。但是，和往常一样，与 B*Tree 索引相比，它们还有其他缺点。据我所知，Oracle 数据库是唯一支持位图索引的主要 RDBMS。

Answer 2

以下解释仅适用于 Mongo 3.2

之前的版本

Mongo v3.2 对存储引擎进行了各种更改，改进了此问题的性能。

现在 $nin 哈希一个重要的质量，它不是 selective query, First let's understand what selectivity 意味着：

Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query.

现在他们自己也说了:

For instance, the inequality operators $nin and $ne are not very selective since they often match a large portion of the index. As a result, in many cases, a $nin or $ne query with an index may perform no better than a $nin or $ne query that must scan all documents in a collection.

那时候 selectivity 在性能方面很重要。这一切都引出了您的问题，为什么没有使用索引？

好吧，当 Mongo 被要求创建一个查询计划时，他会在所有可用的查询计划之间进行“竞赛”，其中一个是 COLSCAN 即集合扫描，其中第一个计划找到101 份文件获胜。由于 non-selective 查询的效率低下（实际上通常更快，具体取决于查询中的索引和值）是 COLSCAN，请进一步阅读 here

为什么 $nin 比 $in 慢，Mongodb

Why $nin is slower than $in, Mongodb

performance

mongodb

pymongo

mongodb-query

以下解释仅适用于 Mongo 3.2