来自 Cosmos 的 child objects 无限数组中的 Azure 搜索

Question

要求

我有一个数据模型，其中每个 "thing" 有多个 children 并且可以用 JSON 表示，如下所示。

{
    "id": "1",
    "name": "parent_1",
    ... other parent fields ...
    "children": [
        {
            "id": "1_a",
            "name": "child_1_a"
            ... other child fields ...
        },
        {
            "id": "1_b",
            "name": "child_1_b"
            ... other child fields ...
        }
    ]
}

要求我们找到所有 parent 个包含 children 且 name 匹配特定模式的事物。

约束

我们有各种限制：

我们必须使用 SQL API 将数据存储在 Cosmos 中。
我们只能使用 Azure 搜索。

问题

理想情况下，我们会将每个 parent "thing" 作为一个完整文档存储在 Cosmos 中，其中包含所有 children。但是，可能有很多 children，这意味着文档大小有时会超过 Cosmos 文档的 2MB 限制。

我尝试过的事情

尝试 1

另一种方法是在同一 Azure Cosmos 集合中存储单独的 parent 和 child 文档，用 [=] 区分 parent 和 children 17=] 字段并使用 id 字段引用 parent。例如

Parent

{
    "id": "1",
    "name": "parent_1",
    "type": "parent"
}

Child 1

{
    "id": "1_a",
    "name": "child_1_a",
    "type": "child",
    "parentId": "1"
}

Child 2

{
    "id": "1_b",
    "name": "child_1_b",
    "type": "child",
    "parentId": "1"
}

但是，然后搜索 child 名称可以为相同的 parent 返回许多页的结果，因此仅获得几个 parents 可能需要 1000s 页匹配 children 被带回，这从性能的角度来看并不理想。

尝试 2

我想我可以在 Cosoms 中使用 JOIN 来填充 Azure 搜索。但是，这需要不受支持的跨文档联接。

其他选择

我看到的其他建议是：

parent 的 id 上的 Facet，但我读到这会表现不佳。
将 children 分成批次（例如 500 children）并将每个批次附加到 parent。如果单个 parent 有多个批次，则对 parent 的字段进行非规范化。这是目前唯一适用于当前数据的选项，尽管它似乎只是延迟了问题 - 例如在某些时候，批次的数量可能会变得足够大，再次降低搜索性能。

问题

是否可以使用 Cosmos (SQL-API) 和 Azure 搜索来满足此要求？如果可以，如何使用？

Answer 1

为什么不将所有子项作为单独的文档存储到 Azure 中搜索，然后只需添加另一个属性以及您需要的父项信息：

{
            "id": "1_a",
            "name": "child_1_a"
            ... other child fields ...,
            "parent": {
                 "parentId":123,
                 "parentName":"x"
                  ... other parent fields ...,
            }    
}

你的查询也应该被简化（在我看来）。

来自 Cosmos 的 child objects 无限数组中的 Azure 搜索

Azure Search within unbounded array of child objects sourced from Cosmos

azure

azure-cognitive-search

azure-cosmosdb

azure-cosmosdb-sqlapi