Cosmos 分区键上的 STARTSWITH 是否优化了 "fan-out" 的跨分区查询？

Question

Microsoft 明确表示跨分区查询将查询“扇出”到每个分区 (link)：

The following query doesn't have a filter on the partition key (DeviceId). Therefore, it must fan-out to all physical partitions where it is run against each partition's index:

所以我很好奇是否可以通过对分区键（例如 STARTSWITH）执行运行ge 查询来优化“扇出”。

为了测试它，我创建了一个包含七个文档的小型 Cosmos DB：

{
    "partitionKey": "prefix1:",
    "id": "item1a"
},
{
    "partitionKey": "prefix1:",
    "id": "item1b"
},
{
    "partitionKey": "prefix1:",
    "id": "item1c"
},
{
    "partitionKey": "prefix1X:",
    "id": "item1d"
},
{
    "partitionKey": "prefix2:",
    "id": "item2a"
},
{
    "partitionKey": "prefix2:",
    "id": "item2b"
},
{
    "partitionKey": "prefix3:",
    "id": "item3a"
}

它具有分区键“/partitionKey”的默认索引策略。然后我运行一堆查询：

SELECT * FROM c WHERE STARTSWITH(c.partitionKey, 'prefix1')
-- Actual Request Charge: 2.92 RUs

SELECT * FROM c WHERE c.partitionKey = 'prefix1:' OR c.partitionKey = 'prefix1X:'
-- Actual Request Charge: 3.02 RUs

SELECT * FROM c WHERE STARTSWITH(c.partitionKey, 'prefix1:')
SELECT * FROM c WHERE c.partitionKey = 'prefix1:'
-- Each Query Has Actual Request Charge: 2.89 RUs

SELECT * FROM c WHERE STARTSWITH(c.partitionKey, 'prefix2')
SELECT * FROM c WHERE c.partitionKey = 'prefix2:'
-- Each Query Has Actual Request Charge: 2.86 RUs

SELECT * FROM c WHERE STARTSWITH(c.partitionKey, 'prefix3')
SELECT * FROM c WHERE c.partitionKey = 'prefix3:'
-- Each Query Has Actual Request Charge: 2.83 RUs

SELECT * FROM c WHERE c.partitionKey = 'prefix2:' OR c.partitionKey = 'prefix3:'
-- Actual Request Charge: 2.99 RUs

重新运行查询时，请求费用是一致的。费用增长的模式似乎与结果集和查询复杂性一致，可能 'OR' 查询除外。但是，然后我尝试了这个：

SELECT * FROM c
-- Actual Request Charge: 2.35 RUs

所有分区的基本扇出甚至比针对特定分区更快，即使使用相等运算符也是如此。我不明白这是怎么回事。

综上所述，我的示例数据库非常小，只有七个文档。查询集可能不够大，无法信任结果。

那么，如果我有数百万个文档，STARTSWITH(c.partitionKey, 'prefix') 会比分散到所有分区更优化吗？

Answer 1

docs表明有一些效率

With Azure Cosmos DB, typically queries perform in the following order from fastest/most efficient to slower/less efficient.

GET on a single partition key and item key

Query with a filter clause on a single partition key

Query without an equality or range filter clause on any property

Query without filters

Answer 2

随着规模的扩大，“logical partitions" per "physical partition”会越来越少，直到最终每个分区键值都有自己的物理分区。

所以：

if I had millions of documents, would STARTSWITH(c.partitionKey, 'prefix') be more optimized than fanning out to all partitions?

两个查询都将跨多个分区展开。

而且我很确定，由于“Azure Cosmos DB 使用基于散列的分区将逻辑分区分布到物理分区”，具有公共前缀的分区键之间没有局部性，每个 STARTSWITH 查询都必须扇形-out 跨所有物理分区。

Answer 3

我自己试图确定这种方法是否有任何好处，但根据答案似乎没有。

我刚刚了解了私人预览版中的新分层分区键功能，它似乎解决了我们正在努力解决的问题：

https://devblogs.microsoft.com/cosmosdb/hierarchical-partition-keys-private-preview/

Hierarchical partition keys are now available in private preview for the Azure Cosmos DB Core (SQL) API. With hierarchical partition keys, also known as sub-partitioning, you can now natively partition your container with up to three levels of partition keys. This enables more optimal partitioning strategies for multi-tenant scenarios or workloads that would otherwise use synthetic partition keys. Instead of having to choose a single partition key – which often leads to performance trade-offs – you can now use up to three keys to further sub-partition your data, enabling more optimal data distribution and higher scale.

因为这允许最多 3 个键，所以它可以通过将前缀分解为单独的键来解决问题，或者如果有超过 3 个，至少进一步优化它。

例子（来自 link 的用法示例）： https://github.com/AzureCosmosDB/HierarchicalPartitionKeysFeedbackGroup#net-v3-sdk-2

// Get the full partition key path
var id = "0a70accf-ec5d-4c2b-99a7-af6e2ea33d3d"; 
var fullPartitionkeyPath = new PartitionKeyBuilder()
        .Add("Contoso") //TenantId
        .Add("Alice") //UserId
        .Build();
var itemResponse = await containerSubpartitionByTenantId_UserId.ReadItemAsync<dynamic>(id, fullPartitionkeyPath);

注意事项

根据预览 link 看来您需要选择加入预览并创建一个新容器

New containers only – all keys must be specified upon container creation

Cosmos 分区键上的 STARTSWITH 是否优化了 "fan-out" 的跨分区查询？

Does STARTSWITH on Cosmos partition keys optimize "fan-out" of cross-partition queries?

azure-cosmosdb

azure-cosmosdb-sqlapi