在大型数据库中查找没有其他集合引用的文档

Question

我正在使用 MongoDB 3.2.5。有 2 个集合，都包含大约 200 万个文档

Devices
{
    _id: xx
}

Interactions
{
    _id: yy
    StartDateTime: 2016-10-24 17:21:30.989Z
    DeviceId: xx
}

我想找到所有没有交互参考的设备。我试过下面的代码，它适用于小型数据库，但在大型数据库上失败。

var matches = db.Interactions.find({}, { DeviceId: 1 });
var devicesIds = [];
matches.forEach(function(match) { devicesIds.push(match.DeviceId) });
var count = db.Devices.find({ "_id": { $nin : devicesIds } } ).count();
print(count);

它抛出错误信息：

[thread1] Error: BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit.

我也试过：

db.Devices.aggregate([
    {
      $lookup:
        {
          from: "Interactions",
          localField: "_id",
          foreignField: "DeviceId",
          as: "matched_docs"
        }
   },
   {
      $match: { "matched_docs": { $eq: [] } }
   },
   {
      $out: "TempDevicesNoInteraction"
   }
]);

查询运行 3 小时但仍未完成。我必须取消它。以下查询相同：

var count = 0;

db.Devices.find().forEach(function(myDoc) {
    var cursor = db.Interactions.find({DeviceId: myDoc._id});
    if(!cursor.hasNext()) {
        count = count + 1;
    }
});

print(count);

我是MongoDB的新手，请指导我。

Answer 1

您是否在 Interactions 集合的 "DeviceId" 字段上设置了索引？如果不是，则使用 $lookup 运算符的聚合会对 200 万个文档中的每一个进行集合扫描（在 200 万个文档上）....

所以请确保 index 在 "DeviceId".

上

您打算运行经常查询此查询吗？如果答案是肯定的，您可以使用 $out operator 将聚合结果存储在新集合中。这样，初始填充可能需要一些时间，但对该数据的每个查询都执行良好。但这是你必须考虑的问题。

在大型数据库中查找没有其他集合引用的文档

Find documents that has no reference from another collection on large DB

sitecore

mongodb

sitecore8