在内存中加入时，LINQ 查询中的 "where" 位置是否重要？

Question

情况：假设我们正在执行连接两个内存列表的 LINQ 查询（因此不涉及 DbSet 或 SQL-查询生成）并且此查询还有一个 where 子句。此 where 仅过滤原始集中包含的属性（查询的 from 部分）。

问题： linq 查询解释器是否优化此查询，因为它在执行 join 之前先执行 where，而不管我是否在join之前或之后写where？ – 因此它不必对以后未包含的元素执行连接。

示例： 例如，我有一个 categories 列表，我想加入一个 products 列表。但是，我只对 category 和 ID 感兴趣 1. linq 解释器是否在内部执行完全相同的操作，无论我是否写：

from category in categories
join prod in products on category.ID equals prod.CategoryID
where category.ID == 1 // <------ below join
select new { Category = category.Name, Product = prod.Name };

或

from category in categories
where category.ID == 1 // <------ above join
join prod in products on category.ID equals prod.CategoryID
select new { Category = category.Name, Product = prod.Name };

之前的研究： 我已经看到 but the OP author his/her 问题仅针对生成 SQL 的非内存案例。我对 LINQ 在内存中的两个列表上执行连接非常感兴趣。

更新：这不是 "Order execution of chain linq query" 问题的重复，因为引用的问题明确指的是数据库集，而我的问题明确针对非数据库场景。（此外，虽然相似，但我不是在询问基于导航属性的包含，而是关于 "joins"。）

Update2：虽然非常相似，但这也不是 "Is order of the predicate important when using LINQ?" 的重复，因为我明确询问内存中的情况，我看不到明确解决这种情况的引用问题。此外，这个问题有点老了，我实际上对 .NET Core（2012 年不存在）上下文中的 linq 很感兴趣，所以我更新了这个问题的标签以反映第二点。

请注意： 对于这个问题，我的目标是 linq 查询解释器是否以某种方式在后台优化了这个查询，我希望得到一份文档的参考或显示 linq 如何完成此操作的源代码。我不对"it does not matter because the performance of both queries is roughly the same".

等答案感兴趣

Answer 1

LINQ 查询语法将编译为方法链。有关详细信息，请阅读例如in this question.

第一个 LINQ 查询将编译为以下方法链：

categories
    .Join(
        products,
        category => category.ID,
        prod => prod.CategoryID,
        (category, prod) => new { category, prod })
    .Where(t => t.category.ID == 1)
    .Select(t => new { Category = t.category.Name, Product = t.prod.Name });

第二个：

categories
    .Where(category => category.ID == 1)
    .Join(
        products,
        category => category.ID,
        prod => prod.CategoryID,
        (category, prod) => new { Category = category.Name, Product = prod.Name });

如您所见，第二个查询将导致更少的分配（注意只有一个匿名类型与第一个查询中的 2 个，并注意在执行查询时将创建这些匿名类型的实例数）。

此外，很明显第一个查询将对比第二个（已过滤）更多的数据执行连接操作。

对于 LINQ-to-objects 查询，不会有额外的查询优化。

所以第二个版本更可取

Answer 2

对于内存列表 (IEnumerables)，不应用任何优化，查询执行是按内存列表的链式顺序进行的。

我也尝试过 result，首先将其转换为 IQueryable，然后应用过滤，但显然对于这么大的 table，转换时间相当长。

我对这个案例进行了快速测试。

Console.WriteLine($"List Row Count = {list.Count()}"); 
Console.WriteLine($"JoinList Row Count = {joinList.Count()}"); 

var watch = Stopwatch.StartNew();
var result = list.Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Where(t => t.inner.Prop3 == "Prop13")
   .Select(t => new { t.inner.Prop4, t.lst.Prop2}); 
result.Dump();
watch.Stop();

Console.WriteLine($"Result1 Elapsed = {watch.ElapsedTicks}");

watch.Restart();
var result2 = list
   .Where(t => t.Prop3 == "Prop13")
   .Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Select(t => new { t.inner.Prop4, t.lst.Prop2});

result2.Dump();
watch.Stop();
Console.WriteLine($"Result2 Elapsed = {watch.ElapsedTicks}"); 

watch.Restart();
var result3 = list.AsQueryable().Join(joinList, l => l.Prop3, i=> i.Prop3, (lst, inner) => new {lst, inner})
   .Where(t => t.inner.Prop3 == "Prop13")
   .Select(t => new { t.inner.Prop4, t.lst.Prop2}); 
result3.Dump();
watch.Stop();
Console.WriteLine($"Result3 Elapsed = {watch.ElapsedTicks}");

调查结果：

List Count = 100
JoinList Count = 10
Result1 Elapsed = 27
Result2 Elapsed = 17
Result3 Elapsed = 591

List Count = 1000
JoinList Count = 10
Result1 Elapsed = 20
Result2 Elapsed = 12
Result3 Elapsed = 586

List Count = 100000
JoinList Count = 10
Result1 Elapsed = 603
Result2 Elapsed = 19
Result3 Elapsed = 1277

List Count = 1000000
JoinList Count = 10
Result1 Elapsed = 1469
Result2 Elapsed = 88
Result3 Elapsed = 3219

在内存中加入时，LINQ 查询中的 "where" 位置是否重要？

Does "where" position in LINQ query matter when joining in-memory?

c#

linq

join

where

.net-core