（操作系统）如何在 c 中使用 __asm mfence

Question

我正在学习操作系统 class 我的教授给了我们这个作业。

"Place __asm mfence in a proper position."

此问题与使用多线程及其副作用有关。

主线程正在增加 shared_var 但 thread_1 正在同时进行。

因此，当代码递增 2000000 次时，shared_var 变为 199048359.000。

教授说__asm mfence 会解决这个问题。但是，我不知道放在哪里。

我正在尝试在 google、github 和此处搜索问题，但找不到来源。

我不知道这是一个愚蠢的问题，因为我不是计算机科学专业的。

另外，我想知道为什么这段代码显示的是 199948358.0000 而不是 2000000.00

如有任何帮助，我们将不胜感激。

#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <conio.h>

int turn;
int interested[2];
void EnterRegion(int process);
void LeaveRegion(int process);

DWORD WINAPI thread_func_1(LPVOID lpParam);
 volatile  double   shared_var = 0.0;
volatile int    job_complete[2] = {0, 0};


int main(void)
{
    DWORD dwThreadId_1, dwThrdParam_1 = 1; 
    HANDLE hThread_1; 
    int     i, j;

    // Create Thread 1
    hThread_1 = CreateThread( 
        NULL,                        // default security attributes 
        0,                           // use default stack size  
        thread_func_1,                  // thread function 
        &dwThrdParam_1,                // argument to thread function 
        0,                           // use default creation flags 
        &dwThreadId_1
        );                // returns the thread identifier 

   // Check the return value for success. 

    if (hThread_1 == NULL) 
    {
       printf("Thread 1 creation error\n");
       exit(0);
    }
    else 
    {
       CloseHandle( hThread_1 );
    }

    /* I am main thread */
    /* Now Main Thread and Thread 1 runs concurrently */

    for (i = 0; i < 10000; i++) 
    {
        for (j = 0; j < 10000; j++) 
        {
            EnterRegion(0);
            shared_var++;
            LeaveRegion(0);
        }
    }

    printf("Main Thread completed\n");
    job_complete[0] = 1;
    while (job_complete[1] == 0) ;

    printf("%f\n", shared_var);
    _getch();
    ExitProcess(0);
}


DWORD WINAPI thread_func_1(LPVOID lpParam)
{
    int     i, j;

    for (i = 0; i < 10000; i++) {
        for (j = 0; j < 10000; j++) 
        {
            EnterRegion(1);
            shared_var++;
            LeaveRegion(1);
        }
    }

    printf("Thread_1 completed\n");
    job_complete[1] = 1;
    ExitThread(0);
}


void EnterRegion(int process)
{
    _asm mfence;
    int other;

    other = 1 - process;
    interested[process] = TRUE;
    turn = process;
    while (turn == process && interested[other] == TRUE) {}
    _asm mfence;
}

void LeaveRegion(int process)
{
    _asm mfence;
    interested[process] = FALSE;
    _asm mfence;
}

Answer 1

EnterRegion() 和 LeaveRegion() 函数正在使用名为 "Peterson's algorithm" 的东西实现临界区。

现在，彼得森算法的关键是，当线程读取 turn 时，它必须获得 写入的最新（最新）值任何 线程。也就是说，对 turn 的操作必须是顺序一致的。此外，在 EnterRegion() 中写入 interested[] 必须在写入 turn.

之前（或同时）对所有线程可见

所以放置 mfence 的地方是在 turn = process ; 之后——这样线程不会继续，直到它对 turn 的写入对所有其他线程可见。

说服编译器在每次读取turn和interested[]时从内存读取也很重要，所以你应该设置它们volatile.

如果您是为 x86 或 x86_64 编写本文，那就足够了——因为它们通常是 "well behaved"，因此：

对turn和interested[process]的所有写入将按程序顺序发生
turn和interested[other]的所有读取也将按程序顺序发生

并设置那些 volatile 确保编译器也不会 fiddle 顺序。

在 x86 上使用 mfence 和在这种情况下使用 x86_64 的原因是在继续读取 turn 值之前将写入队列刷新到内存中。所以，所有的内存写入都会进入一个队列，在未来的某个时间每次写入都会到达实际内存，并且写入的效果将对其他线程可见——写入有"completed"。按照程序执行它们的相同顺序写入 "complete"，但延迟了。如果线程读取它最近写入的内容，处理器将从写入队列中选择（最近的）值。这意味着线程不需要等到写入 "completes"，这通常是一件好事。但是，这确实意味着该线程不是读取任何其他线程将读取的相同值，至少在写入 "complete" 之前是这样。 mfence 所做的是停止处理器，直到所有未完成的写入都具有 "completed" —— 因此任何后续读取都将读取任何其他线程读取的相同内容。

在 LeaveRegion() 中写入 interested[] 不需要（在 x86/x86_64 上）需要一个 mfence，这很好，因为 mfence 是一项代价高昂的操作.每个线程只会写入自己的 interested[] 标志，并且只会读取其他线程。此写入的唯一限制是它必须 而不是 "complete" 在中写入 EnterRegion() (!)。令人高兴的是 x86/x86_64 按顺序执行所有写入。 [当然，在写入 LeaveRegion() 之后，写入 EnterRegion() 可能会 "complete" 在其他线程读取标志之前。]

对于其他设备，您可能希望其他围栏强制执行 turn 和 interested[] 的 reads/writes 的顺序。但我不会假装知道得足够多，无法就 ARM 或 POWERPC 或其他任何东西提出建议。

（操作系统）如何在 c 中使用 __asm mfence

(Operating System) How can I use __asm mfence in c

c

memory

winapi

operating-system

barrier