这种闭包组合行为是 C# 编译器错误吗？

Question

我在调查一些奇怪的对象生命周期问题时，遇到了 C# 编译器的这个非常令人费解的行为：

考虑以下测试class：

class Test
{
    delegate Stream CreateStream();

    CreateStream TestMethod( IEnumerable<string> data )
    {
        string file = "dummy.txt";
        var hashSet = new HashSet<string>();

        var count = data.Count( s => hashSet.Add( s ) );

        CreateStream createStream = () => File.OpenRead( file );

        return createStream;
    }
}

编译器生成以下内容：

internal class Test
{
  public Test()
  {
    base..ctor();
  }

  private Test.CreateStream TestMethod(IEnumerable<string> data)
  {
    Test.<>c__DisplayClass1_0 cDisplayClass10 = new Test.<>c__DisplayClass1_0();
    cDisplayClass10.file = "dummy.txt";
    cDisplayClass10.hashSet = new HashSet<string>();
    Enumerable.Count<string>(data, new Func<string, bool>((object) cDisplayClass10, __methodptr(<TestMethod>b__0)));
    return new Test.CreateStream((object) cDisplayClass10, __methodptr(<TestMethod>b__1));
  }

  private delegate Stream CreateStream();

  [CompilerGenerated]
  private sealed class <>c__DisplayClass1_0
  {
    public HashSet<string> hashSet;
    public string file;

    public <>c__DisplayClass1_0()
    {
      base..ctor();
    }

    internal bool <TestMethod>b__0(string s)
    {
      return this.hashSet.Add(s);
    }

    internal Stream <TestMethod>b__1()
    {
      return (Stream) File.OpenRead(this.file);
    }
  }
}

原来的class包含两个lambda：s => hashSet.Add( s )和() => File.OpenRead( file )。第一个关闭局部变量 hashSet，第二个关闭局部变量 file。但是，编译器会生成一个包含 hashSet 和 file 的闭包实现 class <>c__DisplayClass1_0。因此，返回的 CreateStream 委托包含并保持对 hashSet 对象的引用，一旦 TestMethod 返回，该对象就应该可用于 GC。

在我遇到这个问题的实际场景中，一个非常大的（即>100mb）对象被错误地包含了。

我的具体问题是：

这是一个错误吗？如果不是，为什么这种行为被认为是可取的？

更新：

C# 5 规范 7.15.5.1 说：

When an outer variable is referenced by an anonymous function, the outer variable is said to have been captured by the anonymous function. Ordinarily, the lifetime of a local variable is limited to execution of the block or statement with which it is associated (§5.1.7). However, the lifetime of a captured outer variable is extended at least until the delegate or expression tree created from the anonymous function becomes eligible for garbage collection.

这似乎对某种程度的解释开放，并且没有明确禁止 lambda 捕获它不引用的变量。但是，this question 涵盖了一个相关场景，@eric-lippert 认为这是一个错误。恕我直言，我认为编译器提供的组合闭包实现是一个很好的优化，但优化不应该用于编译器可以合理检测到的 lambda 可能具有超出当前堆栈帧的生命周期。

如何在不完全放弃使用 lambda 的情况下对此进行编码？尤其是我如何防御性地编写代码，以便将来的代码更改不会突然导致同一方法中其他一些未更改的 lambda 开始包含它不应该包含的内容？

更新：

我提供的代码示例是出于必要而设计的。显然，将 lambda 创建重构为一个单独的方法可以解决这个问题。我的问题不是关于设计最佳实践（@peter-duniho 也涵盖了）。相反，考虑到 TestMethod 的内容，我想知道是否有任何方法可以强制编译器从组合闭包实现中排除 createStream lambda。

郑重声明，我的目标是 .NET 4.6 和 VS 2015。

Answer 1

我不知道 C# 语言规范中有任何内容可以准确规定编译器如何实现匿名方法和变量捕获。这是一个实现细节。

规范所做的是为匿名方法及其捕获变量的行为方式设置一些规则。我没有 C# 6 规范的副本，但这是 C# 5 规范中的相关文本，位于“7.15.5.1 捕获的外部变量”下：

…the lifetime of a captured outer variable is extended at least until the delegate or expression tree created from the anonymous function becomes eligible for garbage collection. [emphasis mine]

规范中没有任何内容限制变量的生命周期。编译器只需要确保变量的寿命足够长，以便在匿名方法需要时保持有效。

所以……

1.Is this a bug? If not, why is this behaviour considered desirable?

不是错误。编译器符合规范。

至于算不算"desirable"，那是一个有内涵的词。什么是 "desirable" 取决于您的优先级。也就是说，编译器作者的首要任务是简化编译器的任务（这样做可以使其运行更快并减少出现错误的机会）。在那种情况下，这个特定的实现可能被认为是 "desirable"。

另一方面，语言设计者和编译器作者都有一个共同的目标，那就是帮助程序员编写工作代码。由于实现细节可能会干扰这一点，因此可以考虑这样的实现细节 "undesirable"。归根结底，这取决于这些优先事项中的每一个如何根据其潜在的竞争目标进行排名。

2.How do I code against this without abandoning the use of lambdas all together? Notably how do I code against this defensively, so that future code changes don't suddenly cause some other unchanged lambda in the same method to start enclosing something that it shouldn't?

很难说没有一个不那么做作的例子。一般来说，我会说显而易见的答案是 "don't mix your lambdas like that"。在您的特定（公认人为设计的）示例中，您有一种方法似乎在做 两件完全不同的事情。由于各种原因，这通常是不受欢迎的，在我看来，这个例子只是添加到那个列表中。

我不知道修复 "two different things" 的最佳方法是什么，但一个明显的替代方法是至少重构该方法，以便 "two different things" 方法委托工作到另外两个方法，每个方法都以描述性命名（具有帮助代码自我记录的额外好处）。

例如：

CreateStream TestMethod( IEnumerable<string> data )
{
    string file = "dummy.txt";
    var hashSet = new HashSet<string>();

    var count = AddAndCountNewItems(data, hashSet);

    CreateStream createStream = GetCreateStreamCallback(file);

    return createStream;
}

int AddAndCountNewItems(IEnumerable<string> data, HashSet<string> hashSet)
{
    return data.Count( s => hashSet.Add( s ) );
}

CreateStream GetCreateStreamCallback(string file)
{
    return () => File.OpenRead( file );
}

这样捕获的变量保持独立。即使编译器出于某种奇怪的原因仍然将它们都放入相同的闭包类型中，它仍然不应该导致在两个闭包之间使用该类型的相同 instance。

您的 TestMethod() 仍然做两件不同的事情，但至少它本身不包含这两个不相关的实现。代码更具可读性和更好的划分，这是一件好事，即使它修复了变量生命周期问题。

Answer 2

Is this a bug?

没有。编译器符合此处的规范。

Why is this behaviour considered desirable?

这是不可取的。正如您在这里发现的，以及我在 2007 年所描述的那样，这非常不幸：

http://blogs.msdn.com/b/ericlippert/archive/2007/06/06/fyi-c-and-vb-closures-are-per-scope.aspx

自 C# 3.0 以来，C# 编译器团队已考虑在每个版本中修复此问题，但它的优先级一直不够高。考虑在 Roslyn github 站点上输入一个问题（如果还没有；很可能有）。

我个人希望看到这个修复；就目前而言，它很大 "gotcha"。

How do I code against this without abandoning the use of lambdas all together?

变量就是捕获的东西。完成后，您可以将 hashset 变量设置为 null。那么唯一消耗的内存是变量的内存，四个字节，而不是它所指的东西的内存，它将被收集。

这种闭包组合行为是 C# 编译器错误吗？

Is this closure combination behaviour a C# compiler bug?

c#

lambda

closures

.net-4.6