Mongodb 查询（聚合框架）耗时极慢

Question

我的数据集不是很大（>100000 条记录）。但是我运行对它们的汇总查询花费了很长时间。

我在 _type 字段上有一个索引。当我运行

db.getCollection('product').find({_type:"healthcare"}).count()

我在 0.015 秒内收到响应...

但是当我运行例如

db.getCollection('product').aggregate([{$group:{_id:"$_type",sum:{$sum:1}}}])

我正在等待 40 秒才能收到回复。

这样的仅索引聚合查询可能有什么问题？我应该在哪里寻找问题？

MongoDB 版本为 3.0+。 wiredTiger 数据存储。这里是 db.stats() 输出：

{
    "collections" : 3,
    "objects" : 113090,
    "avgObjSize" : 259186.2551949774497189,
    "dataSize" : 29311373600.0000000000000000,
    "storageSize" : 29317480288.0000000000000000,
    "numExtents" : 36,
    "indexes" : 4,
    "indexSize" : 15379056.0000000000000000,
    "fileSize" : 32130465792.0000000000000000,
    "nsSizeMB" : 16,
    "extentFreeList" : {
        "num" : 2,
        "totalSize" : 9.85907e+06
    },
    "dataFileVersion" : {
        "major" : 4,
        "minor" : 22
    },
    "ok" : 1.0000000000000000
}

Answer 1

根据 pipeline operators and indexes 上的文档，$group 管道运算符不能使用索引：

The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.

New in version 2.4: The $geoNear pipeline operator takes advantage of a geospatial index. When using $geoNear, the $geoNear pipeline operation must appear as the first stage in an aggregation pipeline.

Even when the pipeline uses an index, aggregation still requires access to the actual documents; i.e. indexes cannot fully cover an aggregation pipeline.

Changed in version 2.6: In previous versions, for very select use cases, an index could cover a pipeline.

因此您的 $group 聚合很慢，因为它将使用完整的集合扫描。

但是，值得注意的是，您要比较的 find 查询的 aggregate 等效项是：

db.getCollection('product').aggregate([
    {$match: {_type: 'healthcare'}},
    {$group: {_id: null, sum: {$sum: 1}}}
])

并且由于该查询以 $match 运算符开头，因此它将使用索引并且性能应该更具可比性。

Mongodb 查询（聚合框架）耗时极慢

Mongodb queries (aggregation framework) take enormously slow time

performance

mongodb

nosql

aggregation-framework