C# Azure CosmosDb 和 Mongo - 如何知道 Find 是否正在命中索引,哪些是针对此场景的最佳索引建议?
C# Azure CosmosDb and Mongo - how to know if Find is hitting an index, and which are the best indexing recommendations for this scenario?
我有一个 ASP.Net Core 3.1 API 使用 Mongo 驱动程序 nuget 包 v2.11 在 Azure CosmosDb 中保存文档。
首先,我的文档的class:
public class Customer
{
public Guid CustomerId {get;set;}
public string Email {get;set;}
public int Channel {get;set;}
public string PartitionKey
{
get { return GetPartitionKey(CustomerId); }
set {; }
}
public static string GetPartitionKey(Guid id)
{
return id.ToString().Substring(0, 2);
}
}
在分享我的存储库 class 之前,我想分享一些关于我在这里遇到的情况的细节。我有一个分区集合(使用我的客户 class 的 PartitionKey 属性)并且我对查找操作有两个要求:
- 能够通过CustomerId和Channel查找(不同Channel可以存在相同的CustomerId)
- 能够检查用户是否存在。如果同一渠道存在 CustomerId 或电子邮件,则客户存在(同样,不同渠道可以存在相同的 CustomerId 或电子邮件)
我的问题是关于适当的索引,以便在我通过分区键以外的其他东西找到时利用它们。让我们转到存储库 class,然后转到索引:
public class MyRepository
{
private IMongoCollection<Customer> Collection;
public MyRepository()
{
MongoClientSettings settings = MongoClientSettings.FromUrl(new MongoUrl("The connection string"));
settings.SslSettings = new SslSettings() { EnabledSslProtocols = SslProtocols.Tls12 };
var mongoClient = new MongoClient(settings);
var database = mongoClient.GetDatabase("db-customer");
this.Collection = database.GetCollection<Customer>("col-customer");
// What indexes here ?!?
}
public Customer GetByKey(Guid customerId, int channel)
{
var channelFilter = Builders<Customer>.Filter.Eq(x => x.Channel, customer.Channel);
var idFilter = Builders<Customer>.Filter.Eq(x => x.CustomerId, customer.CustomerId);
var filter = channelFilter & idFilter;
Customer result = this.Collection.Find(filter).FirstOrDefault();
return result;
}
public bool Exists(Customer customer)
{
var channelFilter = Builders<Customer>.Filter.Eq(x => x.Channel, customer.Channel);
var emailFilter = Builders<Customer>.Filter.Eq(x => x.Email, customer.Email);
var idFilter = Builders<Customer>.Filter.Eq(x => x.CustomerId, customer.CustomerId);
var filter = channelFilter & (emailFilter | idFilter);
bool found = this.Collection.Find(filter).FirstOrDefault() != null;
return found;
}
}
那么,我的问题是,哪个是该存储库的最佳索引设置?我是否应该为我正在搜索的每个字段创建一个索引,如下所示:
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.Channel)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.Email)));
或者我应该创建复合索引,这取决于我尝试尝试的搜索,就像这样?
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId).Ascending(i => i.Channel)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId).Ascending(i => i.Email).Ascending(i => i.Channel)));
通过使用 Azure 监视器检查指标,我总是获得低 RU 消耗和整体低响应时间,但我的存储库在这个阶段有一些记录。恐怕随着记录数量的增加(这将有数百万条记录),RU 消耗变得太大或响应时间太长,或者在最坏的情况下,两者都有。
我可以在这个问题上得到你的两分钱吗?
谢谢。
You should create a compound index only if your query needs to sort efficiently on multiple fields at once. For queries with multiple filters that don't need to sort, create multiple single field indexes instead of a single compound index. One query uses multiple single field indexes where available.
因此,在您的情况下,我发现您有多个不需要排序的过滤器。因此创建多个单字段索引。
我有一个 ASP.Net Core 3.1 API 使用 Mongo 驱动程序 nuget 包 v2.11 在 Azure CosmosDb 中保存文档。
首先,我的文档的class:
public class Customer
{
public Guid CustomerId {get;set;}
public string Email {get;set;}
public int Channel {get;set;}
public string PartitionKey
{
get { return GetPartitionKey(CustomerId); }
set {; }
}
public static string GetPartitionKey(Guid id)
{
return id.ToString().Substring(0, 2);
}
}
在分享我的存储库 class 之前,我想分享一些关于我在这里遇到的情况的细节。我有一个分区集合(使用我的客户 class 的 PartitionKey 属性)并且我对查找操作有两个要求:
- 能够通过CustomerId和Channel查找(不同Channel可以存在相同的CustomerId)
- 能够检查用户是否存在。如果同一渠道存在 CustomerId 或电子邮件,则客户存在(同样,不同渠道可以存在相同的 CustomerId 或电子邮件)
我的问题是关于适当的索引,以便在我通过分区键以外的其他东西找到时利用它们。让我们转到存储库 class,然后转到索引:
public class MyRepository
{
private IMongoCollection<Customer> Collection;
public MyRepository()
{
MongoClientSettings settings = MongoClientSettings.FromUrl(new MongoUrl("The connection string"));
settings.SslSettings = new SslSettings() { EnabledSslProtocols = SslProtocols.Tls12 };
var mongoClient = new MongoClient(settings);
var database = mongoClient.GetDatabase("db-customer");
this.Collection = database.GetCollection<Customer>("col-customer");
// What indexes here ?!?
}
public Customer GetByKey(Guid customerId, int channel)
{
var channelFilter = Builders<Customer>.Filter.Eq(x => x.Channel, customer.Channel);
var idFilter = Builders<Customer>.Filter.Eq(x => x.CustomerId, customer.CustomerId);
var filter = channelFilter & idFilter;
Customer result = this.Collection.Find(filter).FirstOrDefault();
return result;
}
public bool Exists(Customer customer)
{
var channelFilter = Builders<Customer>.Filter.Eq(x => x.Channel, customer.Channel);
var emailFilter = Builders<Customer>.Filter.Eq(x => x.Email, customer.Email);
var idFilter = Builders<Customer>.Filter.Eq(x => x.CustomerId, customer.CustomerId);
var filter = channelFilter & (emailFilter | idFilter);
bool found = this.Collection.Find(filter).FirstOrDefault() != null;
return found;
}
}
那么,我的问题是,哪个是该存储库的最佳索引设置?我是否应该为我正在搜索的每个字段创建一个索引,如下所示:
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.Channel)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.Email)));
或者我应该创建复合索引,这取决于我尝试尝试的搜索,就像这样?
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId).Ascending(i => i.Channel)));
this.Collection.Indexes.CreateOne(new CreateIndexModel<Customer>(Builders<Customer>.IndexKeys.Ascending(i => i.CustomerId).Ascending(i => i.Email).Ascending(i => i.Channel)));
通过使用 Azure 监视器检查指标,我总是获得低 RU 消耗和整体低响应时间,但我的存储库在这个阶段有一些记录。恐怕随着记录数量的增加(这将有数百万条记录),RU 消耗变得太大或响应时间太长,或者在最坏的情况下,两者都有。
我可以在这个问题上得到你的两分钱吗? 谢谢。
You should create a compound index only if your query needs to sort efficiently on multiple fields at once. For queries with multiple filters that don't need to sort, create multiple single field indexes instead of a single compound index. One query uses multiple single field indexes where available.
因此,在您的情况下,我发现您有多个不需要排序的过滤器。因此创建多个单字段索引。