在多线程代码中转发

Question

我正在研究一系列优化算法的抽象。这些算法可以运行串行或多线程，使用锁定机制或原子操作。

当涉及到算法的多线程版本时，我有一个关于完美转发的问题。比方说，我有一些我不愿意复制的仿函数，因为它很昂贵。我可以确保仿函数是静态的，因为调用它们的 operator()(...) 不会改变对象的状态。下面是一个这样的虚拟函子：

#include <algorithm>
#include <iostream>
#include <iterator>
#include <thread>
#include <vector>

template <class value_t> struct WeightedNorm {
  WeightedNorm() = default;
  WeightedNorm(std::vector<value_t> w) : w{std::move(w)} {}

  template <class Container> value_t operator()(Container &&c) const & {
    std::cout << "lvalue version with w: " << w[0] << ',' << w[1] << '\n';
    value_t result{0};
    std::size_t idx{0};
    auto begin = std::begin(c);
    auto end = std::end(c);
    while (begin != end) {
      result += w[idx++] * *begin * *begin;
      *begin++ /* += 1 */; // <-- we can also modify
    }
    return result; /* well, return std::sqrt(result), to be precise */
  }

  template <class Container> value_t operator()(Container &&c) const && {
    std::cout << "rvalue version with w: " << w[0] << ',' << w[1] << '\n';
    value_t result{0};
    std::size_t idx{0};
    auto begin = std::begin(c);
    auto end = std::end(c);
    while (begin != end) {
      result += w[idx++] * *begin * *begin;
      *begin++ /* += 1 */; // <-- we can also modify
    }
    return result; /* well, return std::sqrt(result), to be precise */
  }

private:
  std::vector<value_t> w;
};

这个仿函数可能也有它的一些成员函数的引用限定符，如上所示（尽管在上面，它们彼此没有区别）。此外，允许函数对象修改其输入 c。为了将这个仿函数正确地转发给算法中的工作线程，我想到了以下几点：

template <class value_t> struct algorithm {
  algorithm() = default;
  algorithm(const unsigned int nthreads) : nthreads{nthreads} {}

  template <class InputIt> void initialize(InputIt begin, InputIt end) {
    x = std::vector<value_t>(begin, end);
  }

  template <class Func> void solve_ref_1(Func &&f) {
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(&algorithm::kernel<decltype((f)), decltype(x)>, this,
                           std::ref(f), x);
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_ref_2(Func &&f) {
    auto &xlocal = x;
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread([&, xlocal]() mutable { kernel(f, xlocal); });
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_forward_1(Func &&f) {
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(
          &algorithm::kernel<decltype(std::forward<Func>(f)), decltype(x)>,
          this, std::ref(f), x); /* this is compilation error */
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_forward_2(Func &&f) {
    auto &xlocal = x;
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(
          [&, xlocal]() mutable { kernel(std::forward<Func>(f), xlocal); });
    for (auto &worker : workers)
      worker.join();
  }

private:
  template <class Func, class Container> void kernel(Func &&f, Container &&c) {
    std::forward<Func>(f)(std::forward<Container>(c));
  }

  std::vector<value_t> x;
  unsigned int nthreads{std::thread::hardware_concurrency()};
};

基本上，我在写上述内容时想到的是 algorithm::solve_ref_1 和 algorithm::solve_ref_2 仅在使用 lambda 函数时彼此不同。最后，它们都调用 kernel 并带有对 f 的左值引用和对 x 的左值引用，其中 x 被复制到每个线程中，要么是由于std::thread 如何工作或通过 lambda 中的副本捕获 xlocal。这个对吗？我应该小心选择其中之一吗？

到目前为止，我没能完成我想达到的目标。我没有对 f 进行不必要的复制，但我也没有尊重它的引用限定符。然后，我想到了转发f到kernel。在上面，由于右值引用的 std::ref 的已删除构造函数，我找不到使 algorithm::solve_forward_1 编译的方法。但是，使用 lambda 函数方法的 algorithm::solve_forward_2 似乎有效。 "seems to be working," 我的意思是下面的主程序

int main(int argc, char *argv[]) {
  std::vector<double> x{1, 2};
  algorithm<double> alg(2);
  alg.initialize(std::begin(x), std::end(x));

  alg.solve_ref_1(WeightedNorm<double>{{1, 2}});
  alg.solve_ref_2(WeightedNorm<double>{{1, 2}});
  // alg.solve_forward_1(WeightedNorm<double>{{1, 2}});
  alg.solve_forward_2(WeightedNorm<double>{{1, 2}});

  return 0;
}

编译并打印以下内容：

./main.out
lvalue version with w: 1,2
lvalue version with w: 1,2
lvalue version with w: 1,2
lvalue version with w: 1,2
rvalue version with w: 1,2
rvalue version with w: 1,2

简而言之，我有两个主要问题：

有什么理由让我更喜欢 lambda 函数版本（或者反之亦然），并且
在我的情况 allowed/OK 中是否多次完美转发仿函数 f？

我问的是上面的 2.，因为在 the answer 另一个问题中，作者说：

You cannot forward something more than once, though, because that makes no sense. Forwarding means that you're potentially moving the argument all the way through to the final caller, and once it's moved it's gone, so you cannot then use it again.

我假设，就我而言，我没有移动任何东西，而是试图尊重引用限定符。在我的主程序的输出中，我可以看到 w 在右值版本中有正确的值，即，1,2，但这并不意味着我正在做一些未定义的行为，例如尝试访问已经移动的向量的值。

如果你能帮助我更好地理解这一点，我将不胜感激。我也愿意接受有关我尝试解决问题的方式的任何其他反馈。

Answer 1

没有理由更喜欢任何一个
在 for 周期内转发不正常。您不能将同一个变量转发两次：

template <typename T> void func(T && param) { func1(std::forward<T>(param)); func2(std::forward<T>(param)); // UB }

另一方面链转发（std::forward(std::forward(…))）没问题。

在多线程代码中转发

Forwarding in multi-threaded code

c++

multithreading

perfect-forwarding

c++11