如何在不超过最大文档大小的情况下编写聚合？

Question

我通过如下查询得到了exceeds maximum document size problem异常，

pipe = [
    {"$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)} }}
    ]
res =db.patients.aggregate(pipe,allowDiskUse=True)

我通过添加 $project 运算符修复了它，

然而，即使我使用 $project 文件仍然超过 16MB 怎么办？

我能做什么？任何的想法？谢谢

pipe = [
    {"$project": {"birthday":1, "id":1}
    },
    {"$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)} }
     }
    ]
res =db.patients.aggregate(pipe,allowDiskUse=True)

异常

OperationFailure: command SON([('aggregate', 'patients'), ('pipeline', [{'$match': {'birthday': {'$gte': datetime.datetime(1987, 1, 1, 0, 0)}}}]), ('allowDiskUse', True)]) on namespace tw_insurance_security_development.$cmd failed: exception: aggregation result exceeds maximum document size (16MB)

Answer 1

默认情况下，聚合结果 return 在单个 BSON 文档中提供给您，这是大小限制的来源。如果您需要 return 以上，您可以：

将结果输出到集合中。您可以通过使用
完成管道来完成此操作
{"$out": "some-collection-name"}

然后您可以正常查询该集合（完成后您需要自行删除它）
通过在调用聚合时指定 useCursor=True 将结果 return 编辑为游标。

这两个选项都需要 mongodb 2.6：如果您仍然是运行 mongodb 2.4，那么这只是聚合的基本限制。

Answer 2

使用以下代码段

db.patients.runCommand('aggregate', 
        {pipeline: [
    {"$project": {"birthday":1, "id":1}},
    {"$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)} }}
], 
        allowDiskUse: true})

这里 allowDiskUse 将有助于找出超过 16 MB 的数据

Answer 3

正如@Frederick 所说，至少需要 mongo 2.6，为了进一步参考，here 是 mongo 文档中的 link，其工作方式类似于 runCommand 方式，但使用 db.collection.aggreagate，请注意，对于文档限制，请使用 "cursor" 选项，对于排序限制，请使用 "allowDiskUse" 选项。

Answer 4

您可以使用 aggregateCursor(collection_name, $pipeLine, ["useCursor" => true]).

pipe = [
    {"$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)} }}
    ]
res =db.patients.aggregateCursor(collection_name, pipe, ["useCursor" => true]);
        
$ret = [];

foreach ($taskList as $task){
  array_push($ret, $task);
}
        
return $ret;

如何在不超过最大文档大小的情况下编写聚合？

How could I write aggregation without exceeds maximum document size?

mongodb

pymongo

异常