C++17 并行算法是否已经实现?
Are C++17 Parallel Algorithms implemented already?
我曾尝试使用 C++17 标准中提出的新并行库功能,但我无法让它工作。我尝试使用 g++ 8.1.1
和 clang++-6.0
以及 -std=c++17
的最新版本进行编译,但它们似乎都不支持 #include <execution>
、std::execution::par
或任何类似的东西。
在查看并行算法的 cppreference 时,有一长串算法声称
Technical specification provides parallelized versions of the following 69 algorithms from algorithm
, numeric
and memory
: ( ... long list ...)
听起来算法已经准备就绪 'on paper',但还没有准备好使用?
在一年多前的 中,答案声称这些功能尚未实现。但到现在为止,我希望看到某种实施。有什么我们可以使用的吗?
Gcc 尚未实现并行 TS(参见 https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017)
但是 libstdc++(带有 gcc)有一些等效并行算法的实验模式。参见 https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html
开始工作:
Any use of parallel functionality requires additional compiler and
runtime support, in particular support for OpenMP. Adding this support
is not difficult: just compile your application with the compiler flag
-fopenmp. This will link in libgomp, the GNU Offloading and Multi Processing Runtime Library, whose presence is mandatory.
代码示例
#include <vector>
#include <parallel/algorithm>
int main()
{
std::vector<int> v(100);
// ...
// Explicitly force a call to parallel sort.
__gnu_parallel::sort(v.begin(), v.end());
return 0;
}
英特尔发布了一个遵循 C++17 标准的并行 STL 库:
您可以参考https://en.cppreference.com/w/cpp/compiler_support查看所有C++
功能的实现状态。对于您的情况,只需搜索“Standardization of Parallelism TS
”,您会发现现在只有 MSVC
和 Intel C++
编译器支持此功能。
GCC 9 有它们,但你必须单独安装 TBB
在Ubuntu19.10,所有组件终于对齐了:
- GCC 9 is the default one,以及 TBB 所需的最低版本
- TBB (Intel Thread Building Blocks) 在 2019~U8-1,因此它满足 2018 年的最低要求
所以你可以简单地做:
sudo apt install gcc libtbb-dev
g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -o main.out main.cpp -ltbb
./main.out
并用作:
#include <execution>
#include <algorithm>
std::sort(std::execution::par_unseq, input.begin(), input.end());
另请参阅下面的完整 运行可用基准。
GCC 9 和 TBB 2018 是发行说明中提到的第一批工作:https://gcc.gnu.org/gcc-9/changes.html
Parallel algorithms and <execution>
(requires Thread Building Blocks 2018 or newer).
相关话题:
- How to install TBB from source on Linux and make it work
- trouble linking INTEL tbb library
Ubuntu 18.04 安装
Ubuntu18.04有点复杂:
- GCC 9 can be obtained from a trustworthy PPA,还不错
- TBB is at version 2017,这是行不通的,我找不到适合它的可靠 PPA。从源代码编译很容易,但是没有安装目标很烦人...
这里是 Ubuntu 18.04 的全自动测试命令:
# Install GCC 9
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-9 g++-9
# Compile libtbb from source.
sudo apt-get build-dep libtbb-dev
git clone https://github.com/intel/tbb
cd tbb
git checkout 2019_U9
make -j `nproc`
TBB="$(pwd)"
TBB_RELEASE="${TBB}/build/linux_intel64_gcc_cc7.4.0_libc2.27_kernel4.15.0_release"
# Use them to compile our test program.
g++-9 -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -I "${TBB}/include" -L
"${TBB_RELEASE}" -Wl,-rpath,"${TBB_RELEASE}" -o main.out main.cpp -ltbb
./main.out
测试程序分析
我已经用这个比较并行和串行排序速度的程序进行了测试。
main.cpp
#include <algorithm>
#include <cassert>
#include <chrono>
#include <execution>
#include <random>
#include <iostream>
#include <vector>
int main(int argc, char **argv) {
using clk = std::chrono::high_resolution_clock;
decltype(clk::now()) start, end;
std::vector<unsigned long long> input_parallel, input_serial;
unsigned int seed;
unsigned long long n;
// CLI arguments;
std::uniform_int_distribution<uint64_t> zero_ull_max(0);
if (argc > 1) {
n = std::strtoll(argv[1], NULL, 0);
} else {
n = 10;
}
if (argc > 2) {
seed = std::stoi(argv[2]);
} else {
seed = std::random_device()();
}
std::mt19937 prng(seed);
for (unsigned long long i = 0; i < n; ++i) {
input_parallel.push_back(zero_ull_max(prng));
}
input_serial = input_parallel;
// Sort and time parallel.
start = clk::now();
std::sort(std::execution::par_unseq, input_parallel.begin(), input_parallel.end());
end = clk::now();
std::cout << "parallel " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
// Sort and time serial.
start = clk::now();
std::sort(std::execution::seq, input_serial.begin(), input_serial.end());
end = clk::now();
std::cout << "serial " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
assert(input_parallel == input_serial);
}
在 Ubuntu 19.10,Lenovo ThinkPad P51 笔记本电脑 CPU:英特尔酷睿 i7-7820HQ CPU(4 核/8 线程,2.90 GHz 基频,8 MB 缓存), RAM:2x Samsung M471A2K43BB1-CRC(2x 16GiB,2400 Mbps)输入的典型输出有 1 亿个数字要排序:
./main.out 100000000
是:
parallel 2.00886 s
serial 9.37583 s
所以并行版本大约快 4.5 倍!另见:What do the terms "CPU bound" and "I/O bound" mean?
我们可以通过 strace
:
确认进程正在生成线程
strace -f -s999 -v ./main.out 100000000 |& grep -E 'clone'
显示几行类型:
[pid 25774] clone(strace: Process 25788 attached
[pid 25774] <... clone resumed> child_stack=0x7fd8c57f4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd8c57f59d0, tls=0x7fd8c57f5700, child_tidptr=0x7fd8c57f59d0) = 25788
此外,如果我注释掉串行版本和 运行 :
time ./main.out 100000000
我得到:
real 0m5.135s
user 0m17.824s
sys 0m0.902s
confirms again that the algorithm was parallelized since real < user,并给出了它在我的系统中并行化的效率(8 核大约 3.5 倍)。
错误信息
Google,请索引。
如果你没有安装tbb,错误是:
In file included from /usr/include/c++/9/pstl/parallel_backend.h:14,
from /usr/include/c++/9/pstl/algorithm_impl.h:25,
from /usr/include/c++/9/pstl/glue_execution_defs.h:52,
from /usr/include/c++/9/execution:32,
from parallel_sort.cpp:4:
/usr/include/c++/9/pstl/parallel_backend_tbb.h:19:10: fatal error: tbb/blocked_range.h: No such file or directory
19 | #include <tbb/blocked_range.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
所以我们看到 <execution>
依赖于一个未安装的 TBB 组件。
如果 TBB 太旧,例如默认 Ubuntu 18.04 一个,它失败了:
#error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.
Gcc 现在支持执行 header,但不支持来自 https://apt.llvm.org
的标准 clang 构建
我曾尝试使用 C++17 标准中提出的新并行库功能,但我无法让它工作。我尝试使用 g++ 8.1.1
和 clang++-6.0
以及 -std=c++17
的最新版本进行编译,但它们似乎都不支持 #include <execution>
、std::execution::par
或任何类似的东西。
在查看并行算法的 cppreference 时,有一长串算法声称
Technical specification provides parallelized versions of the following 69 algorithms from
algorithm
,numeric
andmemory
: ( ... long list ...)
听起来算法已经准备就绪 'on paper',但还没有准备好使用?
在一年多前的
Gcc 尚未实现并行 TS(参见 https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017)
但是 libstdc++(带有 gcc)有一些等效并行算法的实验模式。参见 https://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html
开始工作:
Any use of parallel functionality requires additional compiler and runtime support, in particular support for OpenMP. Adding this support is not difficult: just compile your application with the compiler flag -fopenmp. This will link in libgomp, the GNU Offloading and Multi Processing Runtime Library, whose presence is mandatory.
代码示例
#include <vector>
#include <parallel/algorithm>
int main()
{
std::vector<int> v(100);
// ...
// Explicitly force a call to parallel sort.
__gnu_parallel::sort(v.begin(), v.end());
return 0;
}
英特尔发布了一个遵循 C++17 标准的并行 STL 库:
您可以参考https://en.cppreference.com/w/cpp/compiler_support查看所有C++
功能的实现状态。对于您的情况,只需搜索“Standardization of Parallelism TS
”,您会发现现在只有 MSVC
和 Intel C++
编译器支持此功能。
GCC 9 有它们,但你必须单独安装 TBB
在Ubuntu19.10,所有组件终于对齐了:
- GCC 9 is the default one,以及 TBB 所需的最低版本
- TBB (Intel Thread Building Blocks) 在 2019~U8-1,因此它满足 2018 年的最低要求
所以你可以简单地做:
sudo apt install gcc libtbb-dev
g++ -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -o main.out main.cpp -ltbb
./main.out
并用作:
#include <execution>
#include <algorithm>
std::sort(std::execution::par_unseq, input.begin(), input.end());
另请参阅下面的完整 运行可用基准。
GCC 9 和 TBB 2018 是发行说明中提到的第一批工作:https://gcc.gnu.org/gcc-9/changes.html
Parallel algorithms and
<execution>
(requires Thread Building Blocks 2018 or newer).
相关话题:
- How to install TBB from source on Linux and make it work
- trouble linking INTEL tbb library
Ubuntu 18.04 安装
Ubuntu18.04有点复杂:
- GCC 9 can be obtained from a trustworthy PPA,还不错
- TBB is at version 2017,这是行不通的,我找不到适合它的可靠 PPA。从源代码编译很容易,但是没有安装目标很烦人...
这里是 Ubuntu 18.04 的全自动测试命令:
# Install GCC 9
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-9 g++-9
# Compile libtbb from source.
sudo apt-get build-dep libtbb-dev
git clone https://github.com/intel/tbb
cd tbb
git checkout 2019_U9
make -j `nproc`
TBB="$(pwd)"
TBB_RELEASE="${TBB}/build/linux_intel64_gcc_cc7.4.0_libc2.27_kernel4.15.0_release"
# Use them to compile our test program.
g++-9 -ggdb3 -O3 -std=c++17 -Wall -Wextra -pedantic -I "${TBB}/include" -L
"${TBB_RELEASE}" -Wl,-rpath,"${TBB_RELEASE}" -o main.out main.cpp -ltbb
./main.out
测试程序分析
我已经用这个比较并行和串行排序速度的程序进行了测试。
main.cpp
#include <algorithm>
#include <cassert>
#include <chrono>
#include <execution>
#include <random>
#include <iostream>
#include <vector>
int main(int argc, char **argv) {
using clk = std::chrono::high_resolution_clock;
decltype(clk::now()) start, end;
std::vector<unsigned long long> input_parallel, input_serial;
unsigned int seed;
unsigned long long n;
// CLI arguments;
std::uniform_int_distribution<uint64_t> zero_ull_max(0);
if (argc > 1) {
n = std::strtoll(argv[1], NULL, 0);
} else {
n = 10;
}
if (argc > 2) {
seed = std::stoi(argv[2]);
} else {
seed = std::random_device()();
}
std::mt19937 prng(seed);
for (unsigned long long i = 0; i < n; ++i) {
input_parallel.push_back(zero_ull_max(prng));
}
input_serial = input_parallel;
// Sort and time parallel.
start = clk::now();
std::sort(std::execution::par_unseq, input_parallel.begin(), input_parallel.end());
end = clk::now();
std::cout << "parallel " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
// Sort and time serial.
start = clk::now();
std::sort(std::execution::seq, input_serial.begin(), input_serial.end());
end = clk::now();
std::cout << "serial " << std::chrono::duration<float>(end - start).count() << " s" << std::endl;
assert(input_parallel == input_serial);
}
在 Ubuntu 19.10,Lenovo ThinkPad P51 笔记本电脑 CPU:英特尔酷睿 i7-7820HQ CPU(4 核/8 线程,2.90 GHz 基频,8 MB 缓存), RAM:2x Samsung M471A2K43BB1-CRC(2x 16GiB,2400 Mbps)输入的典型输出有 1 亿个数字要排序:
./main.out 100000000
是:
parallel 2.00886 s
serial 9.37583 s
所以并行版本大约快 4.5 倍!另见:What do the terms "CPU bound" and "I/O bound" mean?
我们可以通过 strace
:
strace -f -s999 -v ./main.out 100000000 |& grep -E 'clone'
显示几行类型:
[pid 25774] clone(strace: Process 25788 attached
[pid 25774] <... clone resumed> child_stack=0x7fd8c57f4fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd8c57f59d0, tls=0x7fd8c57f5700, child_tidptr=0x7fd8c57f59d0) = 25788
此外,如果我注释掉串行版本和 运行 :
time ./main.out 100000000
我得到:
real 0m5.135s
user 0m17.824s
sys 0m0.902s
confirms again that the algorithm was parallelized since real < user,并给出了它在我的系统中并行化的效率(8 核大约 3.5 倍)。
错误信息
Google,请索引。
如果你没有安装tbb,错误是:
In file included from /usr/include/c++/9/pstl/parallel_backend.h:14,
from /usr/include/c++/9/pstl/algorithm_impl.h:25,
from /usr/include/c++/9/pstl/glue_execution_defs.h:52,
from /usr/include/c++/9/execution:32,
from parallel_sort.cpp:4:
/usr/include/c++/9/pstl/parallel_backend_tbb.h:19:10: fatal error: tbb/blocked_range.h: No such file or directory
19 | #include <tbb/blocked_range.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
所以我们看到 <execution>
依赖于一个未安装的 TBB 组件。
如果 TBB 太旧,例如默认 Ubuntu 18.04 一个,它失败了:
#error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.
Gcc 现在支持执行 header,但不支持来自 https://apt.llvm.org
的标准 clang 构建