如何索引 cosmosDB 中的缺失值？

Question

假设我需要验证 CosmosDB 集合中的每个文档都已设置 AnImportantProperty（= 属性存在，可能具有显式值 null）。他们中的大多数人都这样做，但对于 "reasons"，其中一些人可能不会。

我可以将新的属性包含到索引策略中，这样我就可以轻松地找到哪些文档适用于索引覆盖查询：

select * from c where is_defined(c.AnImportantProperty)

但是相反的查询（这是我真正感兴趣的）似乎并没有从索引中受益：

select * from c where NOT is_defined(c.AnImportantProperty)

有没有一种方法可以编写 index/query 来查找缺少属性的文档而不进行全面扫描？

编辑： 比如，我听说过一些关于negated indexes and a 的二手传闻。两者都可能表明存在（或将会）解决这种情况的方法。

Answer 1

几乎没有。我想不出任何东西，因为您有效地列出了已知价值以外的所有东西。但是，我会建议一些不同的方法。

在您的所有文档中引入 type 属性
classes/objects 在某些方面有所不同属性可以（我不会说应该，尽管我是认真的）被认为是不同的类型（即使一个只是继承另一个）
将没有属性的存储为 type=“someType”，将具有 type=“someOtherType" 的存储为
根据您需要的类型查询
不够的话再介绍subtype

我可能会尝试类似的东西。任何避免扫描过多的东西。

Answer 2

目前（2019 年 7 月）NOT 似乎是一种编写此类查询的方法。

虽然所有希望都没有丢失，因为根据 Azure Cosmos DB Team comment for the feature request 它处于计划阶段。

更新： 根据 https://devblogs.microsoft.com/cosmosdb/april-query-improvements/:

的报告，该功能现已完成

Queries with inequality filters or filters on undefined values can now be run more efficiently. Previously, these filters did not utilize the index. When executing a query, Azure Cosmos DB would first evaluate other less expensive filters (such as =, >, or <) in the query. If there were inequality filters or filters on undefined values remaining, the query engine would be required to load each of these documents. Since inequality filters and filters on undefined values now utilize the index, we can avoid loading these documents and see a significant improvement in RU charge.

Here’s a full list of query filters with improvements:

Inequality comparison expression (e.g. c.age != 4)

NOT IN expression (e.g. c.name NOT IN (‘Luis’, ‘Andrew’, ‘Deborah’))

NOT IsDefined

Is expressions (e.g. NOT IsDefined(c.age), NOT IsString(c.name))

Coalesce operator expression (e.g. (c.name ?? ‘N/A’) = ‘Thomas’)

Ternary operator expression (e.g. c.name = null ? ‘N/A’ : c.name)

If you have queries with these filters, you should add an index for the relevant properties.

如何索引 cosmosDB 中的缺失值？

How to index for missing values in cosmosDB?

indexing

performance

azure-cosmosdb

azure-cosmosdb-sqlapi