多次查询 CosmosDB 的函数

Question

我们有一个 Python 函数（http 触发）试图从 CosmosDB（输入绑定）中获取数据。获取的数据量约为 24MB，所有内容都来自 1 个分区。这需要花费很多时间才能完成 >1 分钟。当我们进入 Application insights 时，我们发现实际上有多个（近 30-50 个）查询正在对 CosmosDB 进行。尽管他们每个人都在毫秒的时间范围内完成，但查询的数量正在增加所花费的总时间。
任何人都可以帮助解释为什么会有多个查询，有没有办法减少发生的查询数量。我看到一些文档说大小有限制 CosmosDB limit 并且不确定这是否在这里起作用，但这仍然不能解释函数和 Cosmos 之间发生的许多查询。

Answer 1

不太确定 Azure Function Python SDK，我认为这是查询输出中的 最大响应大小 问题。 Cosmos DB SDK 在内部尝试批量获取数据并 return 在一个集合中。要确认此行为，您可以尝试仅使用 1 或 2 个字段来获取相同数量的记录。

Answer 2

您的查询可能太大（就大小 或资源使用 而言）无法在单个结果中处理。然后，SDK 将发出后续请求以使用延续令牌检索其余结果。如果查询是原因的一个很好的指示是打开浏览器开发人员工具并在从 Azure 门户触发查询时监视网络请求。

Cosmos DB query executions are stateless at the server side, and can be resumed at any time using the x-ms-continuation header. The x-ms-continuation value uses the last processed document resource ID (_rid) to track progress of execution. [source]

您可能需要调整 indexing/partitioning 策略以提高查询效率或增加 database/container.

的 RU

Answer 3

输入绑定实现耗尽了所有结果的查询：https://github.com/Azure/azure-webjobs-sdk-extensions/blob/dev/src/WebJobs.Extensions.CosmosDB/Bindings/CosmosDBEnumerableBuilder.cs#L39-L46

do
{
    DocumentQueryResponse<T> response = await context.Service.ExecuteNextAsync<T>(collectionUri, sqlSpec, continuation);

    finalResults.AddRange(response.Results);
    continuation = response.ResponseContinuation;
}
while (!string.IsNullOrEmpty(continuation));

您看到很多小执行的事实意味着您的查询需要跨越多个往返才能完成。可能是查询是跨分区的，或者数据量需要多次往返（服务有 4Mb limit on responses）。

因此往返次数由您正在执行的查询决定。

多次查询 CosmosDB 的函数

Function querying CosmosDB multiple times

azure-functions

azure-cosmosdb