Lambda 回归自身:这合法吗?

Lambda returning itself: is this legal?

考虑一下这个相当无用的程序:

#include <iostream>
int main(int argc, char* argv[]) {

  int a = 5;

  auto it = [&](auto self) {
      return [&](auto b) {
        std::cout << (a + b) << std::endl;
        return self(self);
      };
  };
  it(it)(4)(6)(42)(77)(999);
}

基本上我们正在尝试制作一个 return 本身的 lambda。

哪个编译器是正确的?是否存在静态约束违规、UB 或两者都没有?

更新 这个细微的修改被 clang 接受了:

  auto it = [&](auto& self, auto b) {
          std::cout << (a + b) << std::endl;
          return [&](auto p) { return self(self,p); };
  };
  it(it,4)(6)(42)(77)(999);

更新 2:我了解如何编写 return 本身的仿函数,或者如何使用 Y 组合器来实现此目的。这更像是一个语言律师问题。

更新 3:问题是 不是 一般来说,lambda 到 return 本身是否合法,但是关于这种特定方式的合法性。

相关问题:.

编辑根据 C++ 规范,此构造是否严格有效似乎存在一些争议。普遍的意见似乎是它是无效的。请参阅其他答案以获得更彻底的讨论。如果 构造有效,则此答案的其余部分适用;下面调整后的代码适用于 MSVC++ 和 gcc,并且 OP 发布了进一步修改后的代码,也适用于 clang。

这是未定义的行为,因为内部 lambda 通过引用捕获参数 self,但 self 在第 7 行的 return 之后超出范围。因此,当返回的 lambda 稍后执行,它正在访问对超出范围的变量的引用。

#include <iostream>
int main(int argc, char* argv[]) {

  int a = 5;

  auto it = [&](auto self) {
      return [&](auto b) {
        std::cout << (a + b) << std::endl;
        return self(self); // <-- using reference to 'self'
      };
  };
  it(it)(4)(6)(42)(77)(999); // <-- 'self' is now out of scope
}

运行 带有 valgrind 的程序说明了这一点:

==5485== Memcheck, a memory error detector
==5485== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5485== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5485== Command: ./test
==5485== 
9
==5485== Use of uninitialised value of size 8
==5485==    at 0x108A20: _ZZZ4mainENKUlT_E_clIS0_EEDaS_ENKUlS_E_clIiEEDaS_ (test.cpp:8)
==5485==    by 0x108AD8: main (test.cpp:12)
==5485== 
==5485== Invalid read of size 4
==5485==    at 0x108A20: _ZZZ4mainENKUlT_E_clIS0_EEDaS_ENKUlS_E_clIiEEDaS_ (test.cpp:8)
==5485==    by 0x108AD8: main (test.cpp:12)
==5485==  Address 0x4fefffdc4 is not stack'd, malloc'd or (recently) free'd
==5485== 
==5485== 
==5485== Process terminating with default action of signal 11 (SIGSEGV)
==5485==  Access not within mapped region at address 0x4FEFFFDC4
==5485==    at 0x108A20: _ZZZ4mainENKUlT_E_clIS0_EEDaS_ENKUlS_E_clIiEEDaS_ (test.cpp:8)
==5485==    by 0x108AD8: main (test.cpp:12)
==5485==  If you believe this happened as a result of a stack
==5485==  overflow in your program's main thread (unlikely but
==5485==  possible), you can try to increase the size of the
==5485==  main thread stack using the --main-stacksize= flag.
==5485==  The main thread stack size used in this run was 8388608.

相反,您可以将外部 lambda 更改为通过引用而不是通过值获取 self,从而避免一堆不必要的副本并解决问题:

#include <iostream>
int main(int argc, char* argv[]) {

  int a = 5;

  auto it = [&](auto& self) { // <-- self is now a reference
      return [&](auto b) {
        std::cout << (a + b) << std::endl;
        return self(self);
      };
  };
  it(it)(4)(6)(42)(77)(999);
}

这个有效:

==5492== Memcheck, a memory error detector
==5492== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5492== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==5492== Command: ./test
==5492== 
9
11
47
82
1004

好像clang是对的。考虑一个简化的例子:

auto it = [](auto& self) {
    return [&self]() {
      return self(self);
    };
};
it(it);

让我们像编译器一样(稍微)过一遍:

  • it 的类型是带有模板调用运算符的 Lambda1
  • it(it); 触发调用运算符的实例化
  • 模板调用运算符的return类型是auto,所以必须推导
  • 我们正在 return 捕获类型为 Lambda1 的第一个参数的 lambda。
  • 那个 lambda 也有一个调用运算符,return它是调用的类型 self(self)
  • 注意:self(self) 正是我们开始的地方!

因此无法推断类型。

根据 [dcl.spec.auto]/9:

,程序格式错误(clang 是正确的)

If the name of an entity with an undeduced placeholder type appears in an expression, the program is ill-formed. Once a non-discarded return statement has been seen in a function, however, the return type deduced from that statement can be used in the rest of the function, including in other return statements.

基本上,内层 lambda 的 return 类型的推导取决于它自己(这里命名的实体是调用运算符)——所以你必须显式提供一个 return 类型。在这种特殊情况下,这是不可能的,因为您需要内部 lambda 的类型但无法命名。但是在其他情况下,尝试像这样强制执行递归 lambda 是可行的。

即使没有它,您也有


在与更聪明的人(即T.C)讨论后,让我详细说明一下。原始代码(略有减少)和提议的新版本(同样减少)之间有一个重要的区别:

auto f1 = [&](auto& self) {
  return [&](auto) { return self(self); } /* #1 */ ; /* #2 */
};
f1(f1)(0);

auto f2 = [&](auto& self, auto) {
  return [&](auto p) { return self(self,p); };
};
f2(f2, 0);

也就是内部表达式self(self)不依赖于f1,但是self(self, p)依赖于f2。当表达式是非依赖的时,它们可以被使用......急切地使用([temp.res]/8,例如 static_assert(false) 是一个硬错误,无论它所在的模板是否被实例化)。

对于 f1,编译器(比如 clang)可以尝试立即实例化它。一旦你到达上面 #2 点的那个 ;,你就知道外层 lambda 的推导类型(它是内层 lambda 的类型),但我们试图早于它使用它(想想它在点 #1) - 我们试图在我们仍在解析内部 lambda 之前使用它,然后我们才知道它的实际类型。这与 dcl.spec.auto/9.

冲突

然而,对于f2,我们不能尝试急切地实例化,因为它是依赖的。我们只能在使用时实例化,到那时我们就知道了一切。


为了真正做到这一点,您需要 y-combinator。论文中的实现:

template<class Fun>
class y_combinator_result {
    Fun fun_;
public:
    template<class T>
    explicit y_combinator_result(T &&fun): fun_(std::forward<T>(fun)) {}

    template<class ...Args>
    decltype(auto) operator()(Args &&...args) {
        return fun_(std::ref(*this), std::forward<Args>(args)...);
    }
};

template<class Fun>
decltype(auto) y_combinator(Fun &&fun) {
    return y_combinator_result<std::decay_t<Fun>>(std::forward<Fun>(fun));
}

而你想要的是:

auto it = y_combinator([&](auto self, auto b){
    std::cout << (a + b) << std::endl;
    return self;
});

TL;DR;

clang 是正确的。

看起来标准中导致此格式错误的部分是 [dcl.spec.auto]p9:

If the name of an entity with an undeduced placeholder type appears in an expression, the program is ill-formed. Once a non-discarded return statement has been seen in a function, however, the return type deduced from that statement can be used in the rest of the function, including in other return statements. [ Example:

auto n = n; // error, n’s initializer refers to n
auto f();
void g() { &f; } // error, f’s return type is unknown

auto sum(int i) {
  if (i == 1)
    return i; // sum’s return type is int
  else
    return sum(i-1)+i; // OK, sum’s return type has been deduced
}

—end example ]

原创作品

如果我们查看提案 A Proposal to Add Y Combinator to the Standard Library,它提供了一个可行的解决方案:

template<class Fun>
class y_combinator_result {
    Fun fun_;
public:
    template<class T>
    explicit y_combinator_result(T &&fun): fun_(std::forward<T>(fun)) {}

    template<class ...Args>
    decltype(auto) operator()(Args &&...args) {
        return fun_(std::ref(*this), std::forward<Args>(args)...);
    }
};

template<class Fun>
decltype(auto) y_combinator(Fun &&fun) {
    return y_combinator_result<std::decay_t<Fun>>(std::forward<Fun>(fun));
}

它明确表示你的例子是不可能的:

C++11/14 lambdas do not encourage recursion: there is no way to reference the lambda object from the body of the lambda function.

它引用了一个 dicussion in which Richard Smith alludes to the error that clang is giving you:

I think this would be better as a first-class language feature. I ran out of time for the pre-Kona meeting, but I was intending on writing a paper to allow giving a lambda a name (scoped to its own body):

auto x = []fib(int a) { return a > 1 ? fib(a - 1) + fib(a - 2) : a; };

Here, 'fib' is the equivalent of the lambda's *this (with some annoying special rules to allow this to work despite the lambda's closure type being incomplete).

Barry 向我指出了后续提案 Recursive lambdas,它解释了为什么这是不可能的,并解决了 dcl.spec.auto#9 限制,还展示了今天没有它的方法:

Lambdas are a useful tool for local code refactoring. However, we sometimes want to use the lambda from within itself, either to permit direct recursion or to allow the closure to be registered as a continuation. This is surprisingly difficult to accomplish well in current C++.

Example:

  void read(Socket sock, OutputBuffer buff) {
  sock.readsome([&] (Data data) {
  buff.append(data);
  sock.readsome(/*current lambda*/);
}).get();

}

One natural attempt to reference a lambda from itself is to store it in a variable and capture that variable by reference:

 auto on_read = [&] (Data data) {
  buff.append(data);
  sock.readsome(on_read);
};

However, this is not possible due to a semantic circularity: the type of the auto variable is not deduced until after the lambda-expression is processed, which means the lambda-expression cannot reference the variable.

Another natural approach is to use a std::function:

 std::function on_read = [&] (Data data) {
  buff.append(data);
  sock.readsome(on_read);
};

This approach compiles, but typically introduces an abstraction penalty: the std::function may incur a memory allocation and the invocation of the lambda will typically require an indirect call.

For a zero-overhead solution, there is often no better approach than defining a local class type explicitly.

根据编译器会或应该为 lambda 表达式生成的 类 重写代码非常容易。

完成后很明显,主要问题只是悬空引用,不接受代码的编译器在 lambda 部门受到了一些挑战。

重写表明没有循环依赖。

#include <iostream>

struct Outer
{
    int& a;

    // Actually a templated argument, but always called with `Outer`.
    template< class Arg >
    auto operator()( Arg& self ) const
        //-> Inner
    {
        return Inner( a, self );    //! Original code has dangling ref here.
    }

    struct Inner
    {
        int& a;
        Outer& self;

        // Actually a templated argument, but always called with `int`.
        template< class Arg >
        auto operator()( Arg b ) const
            //-> Inner
        {
            std::cout << (a + b) << std::endl;
            return self( self );
        }

        Inner( int& an_a, Outer& a_self ): a( an_a ), self( a_self ) {}
    };

    Outer( int& ref ): a( ref ) {}
};

int main() {

  int a = 5;

  auto&& it = Outer( a );
  it(it)(4)(6)(42)(77)(999);
}

反映原始代码中内部 lambda 捕获模板类型项目的方式的完全模板化版本:

#include <iostream>

struct Outer
{
    int& a;

    template< class > class Inner;

    // Actually a templated argument, but always called with `Outer`.
    template< class Arg >
    auto operator()( Arg& self ) const
        //-> Inner
    {
        return Inner<Arg>( a, self );    //! Original code has dangling ref here.
    }

    template< class Self >
    struct Inner
    {
        int& a;
        Self& self;

        // Actually a templated argument, but always called with `int`.
        template< class Arg >
        auto operator()( Arg b ) const
            //-> Inner
        {
            std::cout << (a + b) << std::endl;
            return self( self );
        }

        Inner( int& an_a, Self& a_self ): a( an_a ), self( a_self ) {}
    };

    Outer( int& ref ): a( ref ) {}
};

int main() {

  int a = 5;

  auto&& it = Outer( a );
  it(it)(4)(6)(42)(77)(999);
}

我猜正式规则旨在禁止这种内部机制中的模板化。如果他们确实禁止原始结构。

好吧,您的代码不起作用。但这确实:

template<class F>
struct ycombinator {
  F f;
  template<class...Args>
  auto operator()(Args&&...args){
    return f(f, std::forward<Args>(args)...);
  }
};
template<class F>
ycombinator(F) -> ycombinator<F>;

测试代码:

ycombinator bob = {[x=0](auto&& self)mutable{
  std::cout << ++x << "\n";
  ycombinator ret = {self};
  return ret;
}};

bob()()(); // prints 1 2 3

您的代码是 UB 且格式错误,无需诊断。这很有趣;但两者都可以独立修复。

首先,UB:

auto it = [&](auto self) { // outer
  return [&](auto b) { // inner
    std::cout << (a + b) << std::endl;
    return self(self);
  };
};
it(it)(4)(5)(6);

这是 UB,因为外部通过值获取 self,然后内部通过引用捕获 self,然后在 outer 完成后继续 return 运行.所以段错误肯定没问题。

修复:

[&](auto self) {
  return [self,&a](auto b) {
    std::cout << (a + b) << std::endl;
    return self(self);
  };
};

代码仍然是错误的。要看到这一点,我们可以扩展 lambdas:

struct __outer_lambda__ {
  template<class T>
  auto operator()(T self) const {
    struct __inner_lambda__ {
      template<class B>
      auto operator()(B b) const {
        std::cout << (a + b) << std::endl;
        return self(self);
      }
      int& a;
      T self;
    };
    return __inner_lambda__{a, self};
  }
  int& a;
};
__outer_lambda__ it{a};
it(it);

这个实例化 __outer_lambda__::operator()<__outer_lambda__>:

  template<>
  auto __outer_lambda__::operator()(__outer_lambda__ self) const {
    struct __inner_lambda__ {
      template<class B>
      auto operator()(B b) const {
        std::cout << (a + b) << std::endl;
        return self(self);
      }
      int& a;
      __outer_lambda__ self;
    };
    return __inner_lambda__{a, self};
  }
  int& a;
};

所以我们接下来要判断__outer_lambda__::operator()的return类型。

我们逐行检查它。首先我们创建 __inner_lambda__ 类型:

    struct __inner_lambda__ {
      template<class B>
      auto operator()(B b) const {
        std::cout << (a + b) << std::endl;
        return self(self);
      }
      int& a;
      __outer_lambda__ self;
    };

现在,看那里 -- 它的 return 类型是 self(self)__outer_lambda__(__outer_lambda__ const&)。但是我们正在尝试推断 __outer_lambda__::operator()(__outer_lambda__) 的 return 类型。

不允许您这样做。

虽然实际上 __outer_lambda__::operator()(__outer_lambda__) 的 return 类型实际上并不依赖于 __inner_lambda__::operator()(int) 的 return 类型,但 C++ 在推导 return 类型;它只是逐行检查代码。

self(self)是我们推导之前使用的。程序格式错误。

我们可以通过隐藏 self(self) 来修补它,直到稍后:

template<class A, class B>
struct second_type_helper { using result=B; };

template<class A, class B>
using second_type = typename second_type_helper<A,B>::result;

int main(int argc, char* argv[]) {

  int a = 5;

  auto it = [&](auto self) {
      return [self,&a](auto b) {
        std::cout << (a + b) << std::endl;
        return self(second_type<decltype(b), decltype(self)&>(self) );
      };
  };
  it(it)(4)(6)(42)(77)(999);
}

现在代码是正确的并且可以编译。但我认为这有点hack;只需使用 ycombinator。