使用 Select 而不是 Project 和 MongoDBDriver

Question

我希望 return 一个名为 BookInfo 的 Book 实体的最小版本。我有一个 mongo 图书集合，我想根据出版商 ID 从该集合中检索 BookInfo 列表。以下是同一存储库方法的 2 个实现。一个使用 MongoDriver 的 Project 方法，另一个将 IFindFluent 转换为可枚举的，然后调用 Select.

我想知道在幕后发生的两种实现之间是否有任何不同（特别是如果一个发生在数据库端和发生在内存中）：

使用投影：

 public IEnumerable<BookInfo> GetPublisherBooks(Guid publisherId)
 {
     var collection = _dbContext.GetCollection<Book>(BOOKS_COLLECTION_NAME);

     var bookInfos = collection.Find(book => book.PublisherId == publisherId)
     .Project(book => new BookInfo()
     {
         Id = book.Id,
         Name = book.Name,
         Description = bok.Description
     });

     return bookInfos.ToEnumerable();
 }

使用 IEnumerable 的 Select:

 public IEnumerable<BookInfo> GetPublisherBooks(Guid publisherId)
 {
     var collection = _dbContext.GetCollection<Book>(BOOKS_COLLECTION_NAME);

     var bookInfos = collection.Find(book => book.PublisherId == publisherId)
     .ToEnumerable()
     .Select((Book book) =>
     {
         return new BookInfo()
         {
             Id = book.Id,
             Name = book.Name,
             Description = book.Description
         };
     });

     return bookInfos.ToEnumerable();
 }

谢谢！

Answer 1

让我试着把这个问题分成几部分来回答这个问题。是的，这两个查询之间存在差异。让我尝试解释在这里起作用的多个因素。

两个查询之间的基本区别是：The first query will have the advantage of Projection。这意味着第一个查询将只是 select Id, Name and Description，但第二个查询将首先获得完整的 book 对象，然后在后端获得 select。如果你是有关系数据库背景的，那么第一个查询相当于：

In RelationalDB:

Select Id, Name, Description from Book book where book.PublisherId == <<publisherId>>

In MongoDB:

db.BOOKS_COLLECTION_NAME.find({ "Book" : <<publisherId>> }, {"Id": 1, "Name": 1, "Description": 1});

据我了解，问题可以分为两部分：

投影如何帮助我们。

预测总是有帮助的。因为，我们只是 select 所需要的。因此，在大数据库或具有大量记录的查询中，data transferred on the network will be limited and improves the performance 因为通过网络传输的数据较少。如果您有针对这些特定预测的索引，那么它就是一把金汤匙。
如果您在没有投影的情况下使用繁重的查询，即读取更多数据量，则会导致 increase in IO and degrades the performance。
对于没有投影的查询和 more number of active connections：MongoDB 驱动程序将打开到每个副本集节点的 maxPoolSize 连接（默认为 100）。获取连接的额外尝试将在池中阻塞，最多为 maxPoolSize x waitQueueMultiple。 waitQueueMultiple 默认为 5。因此，使用默认值，最多 500 个线程可以阻止等待来自连接池的可用连接。如果其他线程试图在连接池中排队，将抛出 MongoWaitQueueFullException。

我们应该比较然后获取数据，还是应该获取数据然后比较。

在这里发挥重要作用的是indexes的游戏。 If you have indexes for the projections (fields) that you are comparing, then you can use the comparer on the DB side.
但是对于大型文档，在所有字段上都有索引并不是一个好的做法。 For most common fields used in queries you can create indexes，但对于很少见或仅在内部使用的查询，您可以在创建索引之前考虑一下。 indexes increases the size of db and thus the cost.
所以，where we should compare: DB Side or Code Side，

--> 正确分析您的查询并使用 indexes on most common queries 并在数据库端对这些字段使用比较器。 --> 以及不常见或没有索引的查询，最好在您的代码端进行过滤。 But be cautious, if your non-indexed query is heavy then it may time out. 所以，对于 non-indexed queries try to have concept of pagination in mongodb.

使用 Select 而不是 Project 和 MongoDBDriver

Using Select instead of Project with MongoDBDriver

c#

mongodb

mongodb-.net-driver

投影如何帮助我们。

我们应该比较然后获取数据，还是应该获取数据然后比较。