c++ std::atomic 变量的线程同步问题
Thread synchronization problem with c++ std::atomic variables
以下程序偶尔打印 "bad" 输出时出现意外行为。这两个线程应该使用两个 std::atomic 变量 's_lock1' 和 's_lock2' 进行同步。在 func2 中,为了将 's_var' 变量设置为 1,它必须在 's_lock2' 中自动存储一个非零值,并且另一个线程 (func1) 不能更新 's_lock1'可变的呢。但是,在 func1 中,它以某种方式打印了意外的 "bad" 输出。 s_lock2.load() 语句似乎 return false 相反。这个代码片段有什么问题吗?是否与内存排序有关?
我是 运行 这台安装了 Centos 7 的 8 核 Linux 服务器。非常感谢任何帮助。
#include <iostream>
#include <thread>
#include <atomic>
#include <unistd.h>
std::atomic_uint s_lock1 = 0;
std::atomic_uint s_lock2 = 0;
std::atomic_uint s_var = 0;
static void func1()
{
while (true) {
s_lock1.store(1, std::memory_order_release);
if (s_lock2.load(std::memory_order_acquire) != 0) {
s_lock1.store(0, std::memory_order_release);
continue;
}
if (s_var.load(std::memory_order_acquire) > 0) {
printf("bad\n");
}
usleep(1000);
s_lock1.store(0, std::memory_order_release);
}
}
static void func2()
{
while (true) {
s_lock2.store(1, std::memory_order_release);
if (s_lock1.load(std::memory_order_acquire) != 0) {
s_lock2.store(0, std::memory_order_release);
continue;
}
s_var.store(1, std::memory_order_release);
usleep(5000);
s_var.store(0, std::memory_order_release);
s_lock2.store(0, std::memory_order_release);
}
}
int main()
{
std::thread t1(func1);
std::thread t2(func2);
t1.join();
t2.join();
}
由于 Intel CPUs 中的存储缓冲区,此锁定算法可能会中断:存储不会直接进入 1 级缓存,而是在存储缓冲区中排队一段时间,因此对其他人不可见CPU 那段时间:
To allow performance optimization of instruction execution, the IA-32 architecture allows departures from strong-ordering model called processor ordering in Pentium 4, Intel Xeon, and P6 family processors. These processor-ordering variations (called here the memory-ordering model) allow performance enhancing operations such as allowing reads to go ahead of buffered writes. The goal of any of these variations is to increase instruction execution speeds, while maintaining memory coherency, even in multiple-processor systems.
需要使用 std::memory_order_seq_cst
来刷新存储缓冲区以使此锁定起作用(加载和存储的默认内存顺序,您可以只执行 s_lock1 = 1;
,例如). std::memory_order_seq_cst
用于存储导致编译器生成 xchg
指令或在存储后插入 mfence
指令,这两者都使存储的效果对其他 CPUs 可见:
Atomic operations tagged memory_order_seq_cst
not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged. Sequential ordering may be necessary for multiple producer-multiple consumer situations where all consumers must observe the actions of all producers occurring in the same order. Total sequential ordering requires a full memory fence CPU instruction on all multi-core systems. This may become a performance bottleneck since it forces the affected memory accesses to propagate to every core.
工作示例:
std::atomic<unsigned> s_lock1{0};
std::atomic<unsigned> s_lock2{0};
std::atomic<unsigned> s_var{0};
void func1() {
while(true) {
s_lock1.store(1, std::memory_order_seq_cst);
if(s_lock2.load(std::memory_order_seq_cst) != 0) {
s_lock1.store(0, std::memory_order_seq_cst);
continue;
}
if(s_var.load(std::memory_order_relaxed) > 0) {
printf("bad\n");
}
usleep(1000);
s_lock1.store(0, std::memory_order_seq_cst);
}
}
void func2() {
while(true) {
s_lock2.store(1, std::memory_order_seq_cst);
if(s_lock1.load(std::memory_order_seq_cst) != 0) {
s_lock2.store(0, std::memory_order_seq_cst);
continue;
}
s_var.store(1, std::memory_order_relaxed);
usleep(5000);
s_var.store(0, std::memory_order_relaxed);
s_lock2.store(0, std::memory_order_seq_cst);
}
}
int main() {
std::thread t1(func1);
std::thread t2(func2);
t1.join();
t2.join();
}
以下程序偶尔打印 "bad" 输出时出现意外行为。这两个线程应该使用两个 std::atomic 变量 's_lock1' 和 's_lock2' 进行同步。在 func2 中,为了将 's_var' 变量设置为 1,它必须在 's_lock2' 中自动存储一个非零值,并且另一个线程 (func1) 不能更新 's_lock1'可变的呢。但是,在 func1 中,它以某种方式打印了意外的 "bad" 输出。 s_lock2.load() 语句似乎 return false 相反。这个代码片段有什么问题吗?是否与内存排序有关?
我是 运行 这台安装了 Centos 7 的 8 核 Linux 服务器。非常感谢任何帮助。
#include <iostream>
#include <thread>
#include <atomic>
#include <unistd.h>
std::atomic_uint s_lock1 = 0;
std::atomic_uint s_lock2 = 0;
std::atomic_uint s_var = 0;
static void func1()
{
while (true) {
s_lock1.store(1, std::memory_order_release);
if (s_lock2.load(std::memory_order_acquire) != 0) {
s_lock1.store(0, std::memory_order_release);
continue;
}
if (s_var.load(std::memory_order_acquire) > 0) {
printf("bad\n");
}
usleep(1000);
s_lock1.store(0, std::memory_order_release);
}
}
static void func2()
{
while (true) {
s_lock2.store(1, std::memory_order_release);
if (s_lock1.load(std::memory_order_acquire) != 0) {
s_lock2.store(0, std::memory_order_release);
continue;
}
s_var.store(1, std::memory_order_release);
usleep(5000);
s_var.store(0, std::memory_order_release);
s_lock2.store(0, std::memory_order_release);
}
}
int main()
{
std::thread t1(func1);
std::thread t2(func2);
t1.join();
t2.join();
}
由于 Intel CPUs 中的存储缓冲区,此锁定算法可能会中断:存储不会直接进入 1 级缓存,而是在存储缓冲区中排队一段时间,因此对其他人不可见CPU 那段时间:
To allow performance optimization of instruction execution, the IA-32 architecture allows departures from strong-ordering model called processor ordering in Pentium 4, Intel Xeon, and P6 family processors. These processor-ordering variations (called here the memory-ordering model) allow performance enhancing operations such as allowing reads to go ahead of buffered writes. The goal of any of these variations is to increase instruction execution speeds, while maintaining memory coherency, even in multiple-processor systems.
需要使用 std::memory_order_seq_cst
来刷新存储缓冲区以使此锁定起作用(加载和存储的默认内存顺序,您可以只执行 s_lock1 = 1;
,例如). std::memory_order_seq_cst
用于存储导致编译器生成 xchg
指令或在存储后插入 mfence
指令,这两者都使存储的效果对其他 CPUs 可见:
Atomic operations tagged
memory_order_seq_cst
not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged. Sequential ordering may be necessary for multiple producer-multiple consumer situations where all consumers must observe the actions of all producers occurring in the same order. Total sequential ordering requires a full memory fence CPU instruction on all multi-core systems. This may become a performance bottleneck since it forces the affected memory accesses to propagate to every core.
工作示例:
std::atomic<unsigned> s_lock1{0};
std::atomic<unsigned> s_lock2{0};
std::atomic<unsigned> s_var{0};
void func1() {
while(true) {
s_lock1.store(1, std::memory_order_seq_cst);
if(s_lock2.load(std::memory_order_seq_cst) != 0) {
s_lock1.store(0, std::memory_order_seq_cst);
continue;
}
if(s_var.load(std::memory_order_relaxed) > 0) {
printf("bad\n");
}
usleep(1000);
s_lock1.store(0, std::memory_order_seq_cst);
}
}
void func2() {
while(true) {
s_lock2.store(1, std::memory_order_seq_cst);
if(s_lock1.load(std::memory_order_seq_cst) != 0) {
s_lock2.store(0, std::memory_order_seq_cst);
continue;
}
s_var.store(1, std::memory_order_relaxed);
usleep(5000);
s_var.store(0, std::memory_order_relaxed);
s_lock2.store(0, std::memory_order_seq_cst);
}
}
int main() {
std::thread t1(func1);
std::thread t2(func2);
t1.join();
t2.join();
}