在容量内调整矢量大小时的性能影响

Question

我的代码有以下合成示例：

#include <vector>
#include <array>
#include <cstdlib>

#define CAPACITY 10000

int main() {
    std::vector<std::vector<int>> a;
    std::vector<std::array<int, 2>> b;

    a.resize(CAPACITY, std::vector<int> {0, 0})
    b.resize(CAPACITY, std::array<int, 2> {0, 0})

    for (;;) {
        size_t new_rand_size = (std::rand() % CAPACITY);

        a.resize(new_rand_size);
        b.resize(new_rand_size);

        for (size_t i = 0; i < new_rand_size; ++i) {
            a[i][0] = std::rand();
            a[i][1] = std::rand();
            b[i][0] = std::rand();
            b[i][1] = std::rand();
        }

        process(a); // respectively process(b)
    }
}

很明显，数组版本更好，因为它需要更少的分配，因为数组的大小是固定的并且在内存中是连续的（对吗？）。它只是在容量内再次向上调整时重新初始化。

既然我无论如何都要覆盖，我想知道是否有一种方法可以跳过初始化（例如，通过覆盖分配器或类似的方法）来进一步优化代码。

Answer 1

Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization

是：不调整大小。相反，保留容量并推送（或放置）新元素。

Answer 2

so obviously,

单词"obviously"通常用来表示"I really, really want the following to be true, so I'm going to skip the part where I determine if it is true." ;)（无可否认，你做得更好比大多数人都多，因为你确实为你的结论提出了一些理由。）

the array version is better, because it requires less allocation, as the array is fixed in size and continuous in memory (correct?).

这是否属实取决于实施，但这里有一定的道理。我会采用较少的微观管理方法，并说数组版本是 preferable 因为最终大小是固定的。使用针对您的特殊情况（固定大小数组）设计的工具往往比使用针对更一般情况的工具产生更少的开销。不过，并不总是更少。

另一个要考虑的因素是默认初始化元素的成本。当 std::array 被构造时，它的所有元素也被构造。使用 std::vector，您可以推迟构造元素，直到获得构造参数。对于默认构造代价高昂的对象，您可以使用向量而不是数组来衡量性能提升。（如果您无法衡量差异，请不要担心。）

当您进行比较时，请确保通过正确使用向量来获得公平的机会。由于大小是预先知道的，reserve the required space right away. Also, use emplace_back 以避免不必要的复制。

最后说明："contiguous" 比 "continuous" 多 accurate/descriptive。

It just gets reinitialized when up-resizing again within capacity.

这是影响两种方法的一个因素。事实上，这会导致您的代码表现出未定义的行为。例如，假设您的第一次迭代将外部向量的大小调整为 1，而第二次迭代将其调整为 5。将您的代码与以下内容进行比较：

std::vector<std::vector<int>> a;
a.resize(CAPACITY, std::vector<int> {0, 0});
a.resize(1);
a.resize(5);
std::cout << "Size " << a[1].size() <<".\n";

输出表明此时大小为零，但您的代码将为 a[1][0] 赋值。如果您希望 a 的每个元素默认为 2 个元素的向量，则需要在每次调整 a 大小时指定该默认值，而不仅仅是最初。

Since I'm going to overwrite anyway, I was wondering if there's a way to skip initialization (e.g. by overwriting the allocator or similar) to optimize the code even further.

是的，您可以跳过初始化。事实上，这样做是明智的。使用专为手头任务设计的工具。您的初始化用于增加向量的容量。所以使用其唯一目的是增加向量容量的方法：vector::reserve.

另一种选择——取决于具体情况——可能是根本不调整大小。从一个数组数组开始，跟踪外部数组中的最后一个可用元素。这是一种倒退，因为您现在有一个单独的变量来跟踪大小，但是如果您的实际代码有足够的迭代，那么在大小减小时不调用析构函数的节省可能使这种方法值得。（为了更简洁的代码，编写一个 class 来包装数组的数组并跟踪可用大小。）

在容量内调整矢量大小时的性能影响

Performance impact when resizing vector within capacity

c++

arrays

optimization

vector

allocator