尝试诊断 C++ 标准 for 循环与累积副作用之间的差异

Question

我正在用 C++ 实现最大利润算法。

Find the Maximum Profit for a given stock trading day. Solution should be O(N) in Time and O(1) in Space requirements. You're allowed to purchase once in the day, and sell once in the day.

我最初是这样实现的：

int maxprofit_int(vector<int> const& v) {
    int resProfit = 0;
    int minPrice  = v[0];
    for (auto i = 1; i < v.size(); i++) {
        auto price  = v[i];
        minPrice    = min(minPrice, price);
        auto profit = price - minPrice;
        resProfit   = max(resProfit, price);
    }
    return resProfit;
}

没有异常。

我心想 - “STL 通常提供相同或更好的性能，那会是什么样子？”还有“我也可以制作这个通用的吗？”

template <class itr>
auto maxprofit_generic(itr begin, itr end) -> typename std::iterator_traits<itr>::value_type {
    using v_t = typename iterator_traits<itr>::value_type;
    return std::accumulate(next(begin, 1), end, v_t{0},
                      [minPrice = *begin](auto const& maximumProfit, auto const& price) mutable {
                          minPrice          = std::min(minPrice, price);
                          auto const profit = price - minPrice;
                          return std::max(maximumProfit, profit);
                      });
}

accumulate 作为一个fold 操作，我使用容器的第一个值作为minPrice，它可以在每次传递时更新。起始值始终与 return 值的类型相同。从我能收集到的所有信息来看，这两种算法实际上应该是相同的。

然而，将这两个带到快板凳上，我发现情况并非如此。在 GCC 10.1 和 Clang 10.0 中，使用 -O3，我发现整数版本和通用版本之间的运行时间加倍。我很难相信 Clang 和 GCC 有错误，所以我想知道为什么我的通用性能很糟糕。

有哪些提高通用版速度的建议？

我的 Quick-Bench 代码：

#include <vector>
#include <algorithm>
#include <numeric>

// Functions definitions removed for brevity

std::vector<int> uut = {1,2,4,1,3,5,2,5,7,9,3,4,5,5,5,7,9,9,4,2};

static void Maxprofit_Normal(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    auto res = maxprofit(uut);
    benchmark::DoNotOptimize(res);
  }
}
// Register the function as a benchmark
BENCHMARK(Maxprofit_Normal);

static void Maxprofit_Generic(benchmark::State& state) {
  // Code inside this loop is measured repeatedly
  for (auto _ : state) {
    auto res = maxprofit(uut.cbegin(),uut.cend());
    benchmark::DoNotOptimize(res);
  }
}
BENCHMARK(Maxprofit_Generic);

Answer 1

您的代码已进行基准测试，但未经过测试...

resProfit   = max(resProfit, price);

应该阅读

resProfit   = max(resProfit, profit);

此后两个版本运行以相同的速度更改。

尝试诊断 C++ 标准 for 循环与累积副作用之间的差异

Trying to diagnose difference between C++ standard for-loop and accumulate with side effects

c++

algorithm

gcc

stl

clang