std::move 和 RVO 优化
std::move and RVO optimizations
我最近读到 std::move
如何通过移动值而不是复制它们来加速代码。所以我做了一个测试程序来比较使用 std::vector
.
的速度
代码:
#include <iostream>
#include <vector>
#include <stdint.h>
#ifdef WIN32
#include <Windows.h>
#else
#include <sys/time.h>
#include <ctime>
#endif
#undef max
// Returns the amount of milliseconds elapsed since the UNIX epoch. Works on both
// windows and linux.
uint64_t GetTimeMs64()
{
#ifdef _WIN32
// Windows
FILETIME ft;
LARGE_INTEGER li;
// Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it
// to a LARGE_INTEGER structure.
GetSystemTimeAsFileTime(&ft);
li.LowPart = ft.dwLowDateTime;
li.HighPart = ft.dwHighDateTime;
uint64_t ret = li.QuadPart;
ret -= 116444736000000000LL; // Convert from file time to UNIX epoch time.
ret /= 10000; // From 100 nano seconds (10^-7) to 1 millisecond (10^-3) intervals
return ret;
#else
// Linux
struct timeval tv;
gettimeofday(&tv, NULL);
uint64 ret = tv.tv_usec;
// Convert from micro seconds (10^-6) to milliseconds (10^-3)
ret /= 1000;
// Adds the seconds (10^0) after converting them to milliseconds (10^-3)
ret += (tv.tv_sec * 1000);
return ret;
#endif
}
static std::vector<std::string> GetVec1()
{
std::vector<std::string> o(100000, "abcd");
bool tr = true;
if (tr)
return std::move(o);
return std::move(std::vector<std::string>(100000, "abcd"));
}
static std::vector<std::string> GetVec2()
{
std::vector<std::string> o(100000, "abcd");
bool tr = true;
if (tr)
return o;
return std::vector<std::string>(100000, "abcd");
}
int main()
{
uint64_t timer;
std::vector<std::string> vec;
timer = GetTimeMs64();
for (int i = 0; i < 1000; ++i)
vec = GetVec1();
std::cout << GetTimeMs64() - timer << " timer 1(std::move)" << std::endl;
timer = GetTimeMs64();
for (int i = 0; i < 1000; ++i)
vec = GetVec2();
std::cout << GetTimeMs64() - timer << " timer 2(no move)" << std::endl;
std::cin.get();
return 0;
}
我得到了以下结果:
发布 (x86) /O2。 tr = true
4376 timer 1(std::move)
4191 timer 2(no move)
发布 (x86) /O2。 tr = false
7311 timer 1(std::move)
7301 timer 2(no move)
两个计时器的结果非常接近,差别不大。我已经假设这是因为 Return 值优化 (RVO) 这意味着我的 returns 按值已经在我不知道的情况下被编译器移动了,对吧?
然后我 运行 没有任何优化的新测试以确保我是正确的。
结果:
发布 (x86) /Od。 tr = true
40860 timer 1(std::move)
40863 timer 2(no move)
发布 (x86) /Od。 tr = false
83567 timer 1(std::move)
82075 timer 2(no move)
现在即使 /O2 和 /Od 之间的差异确实很显着,不移动或 std::move
之间的差异(甚至 tr
之间的差异是 true
或 false
) 是最小的。
这是否意味着即使禁用了优化,编译器仍可以应用 RVO
还是 std::move
没有我想象的那么快?
即使您指定了 /Od
,编译器也会执行 RVO。 C++ 标准允许这样做(正如 Kerrek SB 指出的那样,§12.8/31,32)
如果您真的想看到区别,可以将变量声明为 volatile
。这将禁止编译器对其执行 RVO。 (§12.8/31 第 1 项)
您遗漏了一条基本信息:当 return
语句(以及其他一些不太常见的上下文)指定函数局部变量(例如 o
在你的例子中),首先执行从参数构造 return 值的重载决策,就好像参数是右值一样(即使它不是)。只有当这失败时,重载决议才会用左值再次完成。 C++14 12.8/32 涵盖了这一点; C++11 中存在类似的措辞。
12.8/32 When the criteria for elision of a copy/move operation are met, but not for an exception-declaration, and the
object to be copied is designated by an lvalue, or when the expression in a return
statement is a (possibly
parenthesized) id-expression that names an object with automatic storage duration declared in the body or
parameter-declaration-clause of the innermost enclosing function or lambda-expression, overload resolution
to select the constructor for the copy is first performed as if the object were designated by an rvalue. If
the first overload resolution fails or was not performed, or if the type of the first parameter of the selected
constructor is not an rvalue reference to the object’s type (possibly cv-qualified), overload resolution is
performed again, considering the object as an lvalue. [ Note: This two-stage overload resolution must be
performed regardless of whether copy elision will occur. It determines the constructor to be called if elision
is not performed, and the selected constructor must be accessible even if the call is elided. —end note ] ...
(强调我的)
所以实际上,当 return 函数范围自动执行时,每个 return
语句中都存在 不可避免的隐式 std::move
变量。
在 return 语句中使用 std::move
,如果有的话,是一种 悲观化。 它会阻止 NRVO,并且不会给你任何东西,因为"implicitly try rvalue first" 规则。
我最近读到 std::move
如何通过移动值而不是复制它们来加速代码。所以我做了一个测试程序来比较使用 std::vector
.
代码:
#include <iostream>
#include <vector>
#include <stdint.h>
#ifdef WIN32
#include <Windows.h>
#else
#include <sys/time.h>
#include <ctime>
#endif
#undef max
// Returns the amount of milliseconds elapsed since the UNIX epoch. Works on both
// windows and linux.
uint64_t GetTimeMs64()
{
#ifdef _WIN32
// Windows
FILETIME ft;
LARGE_INTEGER li;
// Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it
// to a LARGE_INTEGER structure.
GetSystemTimeAsFileTime(&ft);
li.LowPart = ft.dwLowDateTime;
li.HighPart = ft.dwHighDateTime;
uint64_t ret = li.QuadPart;
ret -= 116444736000000000LL; // Convert from file time to UNIX epoch time.
ret /= 10000; // From 100 nano seconds (10^-7) to 1 millisecond (10^-3) intervals
return ret;
#else
// Linux
struct timeval tv;
gettimeofday(&tv, NULL);
uint64 ret = tv.tv_usec;
// Convert from micro seconds (10^-6) to milliseconds (10^-3)
ret /= 1000;
// Adds the seconds (10^0) after converting them to milliseconds (10^-3)
ret += (tv.tv_sec * 1000);
return ret;
#endif
}
static std::vector<std::string> GetVec1()
{
std::vector<std::string> o(100000, "abcd");
bool tr = true;
if (tr)
return std::move(o);
return std::move(std::vector<std::string>(100000, "abcd"));
}
static std::vector<std::string> GetVec2()
{
std::vector<std::string> o(100000, "abcd");
bool tr = true;
if (tr)
return o;
return std::vector<std::string>(100000, "abcd");
}
int main()
{
uint64_t timer;
std::vector<std::string> vec;
timer = GetTimeMs64();
for (int i = 0; i < 1000; ++i)
vec = GetVec1();
std::cout << GetTimeMs64() - timer << " timer 1(std::move)" << std::endl;
timer = GetTimeMs64();
for (int i = 0; i < 1000; ++i)
vec = GetVec2();
std::cout << GetTimeMs64() - timer << " timer 2(no move)" << std::endl;
std::cin.get();
return 0;
}
我得到了以下结果:
发布 (x86) /O2。 tr = true
4376 timer 1(std::move)
4191 timer 2(no move)
发布 (x86) /O2。 tr = false
7311 timer 1(std::move)
7301 timer 2(no move)
两个计时器的结果非常接近,差别不大。我已经假设这是因为 Return 值优化 (RVO) 这意味着我的 returns 按值已经在我不知道的情况下被编译器移动了,对吧?
然后我 运行 没有任何优化的新测试以确保我是正确的。 结果:
发布 (x86) /Od。 tr = true
40860 timer 1(std::move)
40863 timer 2(no move)
发布 (x86) /Od。 tr = false
83567 timer 1(std::move)
82075 timer 2(no move)
现在即使 /O2 和 /Od 之间的差异确实很显着,不移动或 std::move
之间的差异(甚至 tr
之间的差异是 true
或 false
) 是最小的。
这是否意味着即使禁用了优化,编译器仍可以应用 RVO
还是 std::move
没有我想象的那么快?
即使您指定了 /Od
,编译器也会执行 RVO。 C++ 标准允许这样做(正如 Kerrek SB 指出的那样,§12.8/31,32)
如果您真的想看到区别,可以将变量声明为 volatile
。这将禁止编译器对其执行 RVO。 (§12.8/31 第 1 项)
您遗漏了一条基本信息:当 return
语句(以及其他一些不太常见的上下文)指定函数局部变量(例如 o
在你的例子中),首先执行从参数构造 return 值的重载决策,就好像参数是右值一样(即使它不是)。只有当这失败时,重载决议才会用左值再次完成。 C++14 12.8/32 涵盖了这一点; C++11 中存在类似的措辞。
12.8/32 When the criteria for elision of a copy/move operation are met, but not for an exception-declaration, and the object to be copied is designated by an lvalue, or when the expression in a
return
statement is a (possibly parenthesized) id-expression that names an object with automatic storage duration declared in the body or parameter-declaration-clause of the innermost enclosing function or lambda-expression, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. If the first overload resolution fails or was not performed, or if the type of the first parameter of the selected constructor is not an rvalue reference to the object’s type (possibly cv-qualified), overload resolution is performed again, considering the object as an lvalue. [ Note: This two-stage overload resolution must be performed regardless of whether copy elision will occur. It determines the constructor to be called if elision is not performed, and the selected constructor must be accessible even if the call is elided. —end note ] ...
(强调我的)
所以实际上,当 return 函数范围自动执行时,每个 return
语句中都存在 不可避免的隐式 std::move
变量。
在 return 语句中使用 std::move
,如果有的话,是一种 悲观化。 它会阻止 NRVO,并且不会给你任何东西,因为"implicitly try rvalue first" 规则。