如何在 C++11 中高效地 return 大数据

Question

我真的很困惑 return 在 C++11 中处理大数据。什么是最有效的方法？这是我的相关功能：

void numericMethod1(vector<double>& solution,
                    const double input);

void numericMethod2(pair<vector<double>,vector<double>>& solution1,
                    vector<double>& solution2,
                    const double input1,
                    const double input2);

这是我使用它们的方式：

int main()
{
    // apply numericMethod1
    double input = 0;
    vector<double> solution;
    numericMethod1(solution, input);

    // apply numericMethod2
    double input1 = 1;
    double input2 = 2;
    pair<vector<double>,vector<double>> solution1;
    vector<double> solution2;
    numericMethod2(solution1, solution2, input1, input2);

    return 0;
}

问题是，std::move()在后面的实现中没有用吗？

实施：

void numericMethod1(vector<double>& solution,
                    const double input)
{
    vector<double> tmp_solution;

    for (...)
    {
    // some operation about tmp_solution
    // after that this vector become very large
    }

    solution = std::move(tmp_solution);
}

void numericMethod2(pair<vector<double>,vector<double>>& solution1,
                    vector<double>& solution2,
                    const double input1,
                    const double input2)
{
    vector<double> tmp_solution1_1;
    vector<double> tmp_solution1_2;
    vector<double> tmp_solution2;

    for (...)
    {
    // some operation about tmp_solution1_1, tmp_solution1_2 and tmp_solution2
    // after that the three vector become very large
    }

    solution1.first = std::move(tmp_solution1_1);
    solution1.second = std::move(tmp_solution1_2);
    solution2 = std::move(tmp_solution2);
}

如果它们没用，我如何处理这些大 return 值而无需多次复制？ 免费改API!

更新

感谢Whosebug和这些答案，在深入研究相关问题后，我对这个问题有了更好的了解。由于 RVO，我更改了 API，为了更清楚，我不再使用 std::pair。这是我的新代码：

struct SolutionType
{
    vector<double> X;
    vector<double> Y;
};

SolutionType newNumericMethod(const double input1,
                              const double input2);

int main()
{
    // apply newNumericMethod
    double input1 = 1;
    double input2 = 2;
    SolutionType solution = newNumericMethod(input1, input2);

    return 0;
}

SolutionType newNumericMethod(const double input1,
                              const double input2);
{
    SolutionType tmp_solution; // this will call the default constructor, right?
    // since the name is too long, i make alias.
    vector<double> &x = tmp_solution.X;
    vector<double> &y = tmp_solution.Y;

    for (...)
    {
    // some operation about x and y
    // after that these two vectors become very large
    }

    return tmp_solution;
}

我怎么知道发生了 RVO？或者我如何确保 RVO 发生？

Answer 1

您可以使用 std::vector::swap 成员函数，它与其他容器的内容交换。 不对单个元素调用任何移动、复制或交换操作。

solution1.first.swap(tmp_solution1_1);
solution1.second.swap(tmp_solution1_2);
solution2.swap(tmp_solution2);

编辑：

这些说法不是没有用，

solution1.first = std::move(tmp_solution1_1);
solution1.second = std::move(tmp_solution1_2);
solution2 = std::move(tmp_solution2);

他们调用了 std::vector::operator=(&&) 的移动赋值运算符，它确实移动了右侧的向量。

Answer 2

Return取值，依赖RVO (return value optimization).

auto make_big_vector()
{
    vector<huge_thing> v1;
    // fill v1

    // explicit move is not necessary here        
    return v1;
} 

auto make_big_stuff_tuple()
{
    vector<double> v0;
    // fill v0

    vector<huge_thing> v1;
    // fill v1

    // explicit move is necessary for make_tuple's arguments,
    // as make_tuple uses perfect-forwarding:
    // http://en.cppreference.com/w/cpp/utility/tuple/make_tuple

    return std::make_tuple(std::move(v0), std::move(v1));
}

auto r0 = make_big_vector();
auto r1 = make_big_stuff_tuple();

我会将您函数的 API 更改为按值简单地 return。

Answer 3

当你有像非常大的大数据时 vector<double>，你仍然可以 return 它按值，因为 C++11 的 移动语义 将在 std::vector 中启动，因此 return 从你的函数中调用它只是某种 指针分配 （因为vector<double> 的内容通常是堆分配的。

所以我会这样做：

// No worries in returning large vectors by value
std::vector<double> numericMethod1(const double input)
{
    std::vector<double> result;

    // Compute your vector<double>'s content
    ...

    // NOTE: Don't call std::move() here.
    // A simple return statement is just fine.
    return result;
}

（请注意，根据特定的 C++ 编译器，也可以应用 C++98/03 中已经可用的其他类型的优化，例如 RVO/NRVO。）

相反，如果您有一个方法 returns 多个输出值，那么我会使用 非常量引用，就像在 C++98/03 中一样：

void numericMethod2(pair<vector<double>,vector<double>>& output1,
                    vector<double>& output2,
                    vector<double>& output3,
                    ...
                    const double input1,
                    const double input2);

在实现中，您仍然可以使用有效的 C++98/03 技术 "swap-timization"，您只需调用 std::swap() 即可交换局部变量和输出参数：

#include <utility> // for std::swap

void numericMethod2(pair<vector<double>,vector<double>>& solution1,
                    vector<double>& solution2,
                    const double input1,
                    const double input2)

{
    vector<double> tmp_solution1_1;
    vector<double> tmp_solution1_2;
    vector<double> tmp_solution2;

    // Some processing to compute local solution vectors
    ...

    // Return output values to caller via swap-timization
    swap(solution1.first, tmp_solution1_1);
    swap(solution1.second, tmp_solution1_2);
    swap(solution2, tmp_solution2);
}

交换向量通常将内部向量的指针交换到向量拥有的堆分配内存：所以你只有指针分配，不是深度复制、内存重新分配或类似的昂贵操作。

Answer 4

首先，你为什么不直接在numericMethod2中使用solution1呢？这样更直接。

不同于std::array或obj[]，其值不是存入栈，而是使用堆（可以参考标准库代码，他们大量使用operator new()）。因此，如果您发现该向量只是临时的并且将 return 到其他地方，请使用 std::swap 或 std::move。函数 return 实际上可以转换为 xvalue

标准容器（std::map、std::set、双端队列、列表等）始终如此

如何在 C++11 中高效地 return 大数据

How to return large data efficiently in C++11

c++

return-value

parameter-passing

c++11