如何使用 $lookup 解析对嵌套文档的引用?

How to use $lookup to resolve reference to nested document?

我有 energy_carrier 个嵌套在集合 energy_ carrier_groups 中的文档。 我引用了另一个集合 tech 中的那些 energy_carrier 文档,并想用 $lookup 聚合来解析引用。

=> 如何在 $lookup 中定义一个子查询,在我执行实际 join/lookup 之前预处理/展开能量载体?

我的首选方法是为 fromforeignField 选项指定一个路径,以定位 energy_carrier_groups 集合的嵌套文档:

"from": "energy_carrier_groups.energy_carriers"

"from": "energy_carrier_groups"
"foreignField": "energy_carriers._id". 

然而,这似乎不起作用。

我发现 $lookup 支持 letpipeline 参数作为选项 localFieldforeignField 的替代选项(自版本 3.6 起)并且可能是要走的路。

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#join-conditions-and-subqueries-on-a-joined-collection

从 5.0 版开始,还可以组合所有四个选项:

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#correlated-subqueries-using-concise-syntax

(mongodb 5.0 版的要求是 CPU 支持 AVX 而我的不支持。)

所有这些选项都让我头疼!你能告诉我应该如何制定管道来解决我从技术到能源载体的参考问题吗?不会那么难吧?

创建一些示例数据的代码:

import bson
from pymongo import MongoClient


def unique_id():
    return bson.objectid.ObjectId()


client = MongoClient(host='localhost', port=27017)
database = client.forecast

referenced_id = unique_id()

# create referenced collection
energy_carrier_groups = database.energy_carrier_groups
energy_carrier_groups.delete_many({})
energy_carrier_groups.insert_many([
    {
        '_id': unique_id(),
        'name': 'fuels',
        'energy_carriers': [
            {
                '_id': referenced_id,
                'name': 'oil'
            },
            {
                '_id': unique_id(),
                'name': 'gas'
            }
        ]
    },
    {
        '_id': unique_id(),
        'name': 'electricity',
        'energy_carriers': [
            {
                '_id': unique_id(),
                'name': 'green electricity'
            },
            {
                '_id': unique_id(),
                'name': 'conventional electricity'
            }
        ]
    },

])

# create referencing collection
tech = database.tech
tech.delete_many({})
tech.insert_many([
    {
        '_id': unique_id(),
        'name': 'qux',
        'energy_carrier': referenced_id
    },

])

我的聚合预期结果:

{
    '_id': ObjectId('6183de1b5dd889cfcdeaa711'), 
    'name': 'qux', 
    'energy_carrier': {
        '_id': ObjectId('6183de1b5dd889cfcdeaa70b'), 
        'name': 'oil'
    }
}

第一次试用,使用嵌套文档的路径:

pipeline = [
    {"$match": {"name": 'qux'}},
    {"$lookup": {
       "from": "$energy_carrier_groups.energy_carriers", # <= does not work 
       "localField": "energy_carrier",
       "foreignField": "_id",
       "as": "energy_carrier"
      }
    },
    {"$unwind": "$energy_carrier"},  # transforms lookup result array to a single entry
]
results = referencing.aggregate(pipeline)

for result in results:
    print(result)

print('finished')

另一个试验,使用 letpipeline 代替 localFieldforeignField:

pipeline = [
    {"$match": {"name": 'qux'}},
    {"$lookup": {
       "from": "energy_carrier_groups",
       "let": {"tech_energy_carrier_id": "$energy_carrier"},
       "pipeline": [
           {"$unwind": "$energy_carriers"},
           {"$match": {"$expr": {"$eq": ["$$tech_energy_carrier_id", "$energy_carriers._id"]}}}
       ],
       "as": "energy_carrier"  # overrides id field with an array wrapping the resolved reference
      }
    },
    {"$unwind": "$energy_carrier"},  # transforms array to a single entry
]
results = tech.aggregate(pipeline)

for result in results:
    print(result)

print('finished')

给出一些结果,但用“过滤的能量载体组”解析参考,而不是仅解析能量载体。

=> 解析技术参考能量载体的推荐方法是什么?

=>如果有更好的No-Sql数据库然后MongoDb这个目的,也请告诉我

相关:

https://softwarerecs.stackexchange.com/questions/81175/is-there-an-alternative-to-mongodb-that-allows-to-easily-resolve-document-refere

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#join-conditions-and-subqueries-on-a-joined-collection

https://www.stackchief.com/tutorials/%24lookup%20Examples%20%7C%20MongoDB

你可以试试,

  • letenergy_carrier id 传递给管道
  • $match 使用 $in 运算符检查表达式条件,是 energy_carrier in energy_carriers._id
  • $project 显示必填字段
  • $filter 迭代 energy_carriers 数组的循环并按变量过滤 energy_carrier 传入 let
  • $first 从上面的过滤结果中获取第一个元素
  • $addFields$first从上面的查找结果中得到第一个元素
pipeline = [
  { $match: { name: "qux" } },
  {
    $lookup: {
      from: "energy_carrier_groups",
      let: { energy_carrier: "$energy_carrier" },
      pipeline: [
        {
          $match: { $expr: { $in: ["$$energy_carrier", "$energy_carriers._id"] } }
        },
        {
          $project: {
            _id: 0,
            energy_carriers: {
              $first: {
                $filter: {
                  input: "$energy_carriers",
                  cond: { $eq: ["$$energy_carrier", "$$this._id"] }
                }
              }
            }
          }
        }
      ],
      as: "energy_carrier"
    }
  },
  {
    $addFields: { energy_carrier: { $first: "$energy_carrier.energy_carriers" } }
  }
]

results = tech.aggregate(pipeline)

Playground

一个。这是另一个版本,基于我最初尝试使用 lookup 并结合 turivishal 的 addFields 技巧 override/correct 结果 属性 energy_carrier.

pipeline = [
    {"$match": {"name": 'qux'}},
    {"$lookup": {
       "from": "energy_carrier_groups",
       "let": {"energy_carrier_id": "$energy_carrier"},  # executed on tech
       "pipeline": [  # executed on energy_carrier_groups, with the knowledge of let definition
           {"$unwind": "$energy_carriers"},
           {"$match": {"$expr": {"$eq": ["$$energy_carrier_id", "$energy_carriers._id"]}}}
       ],
       "as": "energy_carrier"  # already includes what we want but also extra fields
      }
    },
    {"$addFields": {  # overrides/corrects the result of the previous stage with parts of it
        "energy_carrier": {"$first": "$energy_carrier.energy_carriers"}
    }
  }
]
results = tech.aggregate(pipeline)

乙。 MongoDb 的替代方案(例如 RethinkDb)可能更适合复杂查询

另见 https://softwarerecs.stackexchange.com/questions/81175/is-there-an-alternative-to-mongodb-that-allows-to-easily-resolve-document-refere