MPI_Barrier() 是如何工作的?

How does MPI_Barrier() work?

我有这个代码:

#include <cstdint>
#include <mpi.h>
#include <iostream>
using namespace std;

int main(int argc, char **argv)
{
    MPI_Init(&argc, &argv);
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    if (rank == 0)
        MPI_Barrier(MPI_COMM_WORLD);
    cout << "Some output\n";
    if (rank == 1)
        MPI_Barrier(MPI_COMM_WORLD);
    MPI_Barrier(MPI_COMM_WORLD);
    cout << "end\n";
    MPI_Finalize();
    return 0;
}

当我运行作为

mpiexec -n 2 MPI.exe

程序有效;输出是:

Some output
End
Some output
End

然而,当我 运行 作为

mpiexec -n 3 MPI.exe

程序不能正常工作。我期望这样的输出:

rank 3 - Some_output
rank 2 - Some output
rank 3 - End
rank 0 - Some output

在这一步,我希望程序停止。

您需要确保每个进程的 Barrier 调用次数相同。在您的特定情况下,当 n=3 时,您有两个针对等级 0 和等级 1 的 Barrier 调用,但对于等级 2 只有 1 个。程序将阻塞,直到等级 2 进程也达到 Barrier。

这是 n=3 应该发生的情况:

together:
    rank 0 will reach barrier 1 then block
    rank 1 will print "some output", reach barrier 2 then block
    rank 2 will print "some output", reach barrier 3 then block
together:
    rank 0 will print "some output", reach barrier 3 then block
    rank 1 will reach barrier 3 then block
    rank 2 will print "end" then hit finalize

一个进程处于终结状态而其他进程被阻止将是一种未定义的行为。


对 n=2 做同样的分析:

together:
    rank 0 will reach barrier 1 then block
    rank 1 will print "some output", reach barrier 2 then block
together:
    rank 0 will print "some output", reach barrier 3 then block
    rank 1 will reach barrier 3 then block
together:
    rank 0 will print "end" then hit finalize
    rank 1 will print "end" then hit finalize

这表明输出应该是:

some output
some output
end 
end

无论你得到什么:

some output
end 
some output
end

这与 mpi 基础结构如何缓存来自各个级别的 stdout 传输有关。如果我们引入延迟以便 MPI 决定它应该收集结果,我们可以更好地看到行为:

#include <cstdint>
#include <unistd.h>
#include <mpi.h>
#include <iostream>
using namespace std;

int main(int argc, char **argv)
{
    MPI_Init(&argc, &argv);
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    if (rank == 0) {
        cout << rank << " Barrier 1\n" << flush;
        MPI_Barrier(MPI_COMM_WORLD);
    }
    cout << rank << " Some output \n" << flush;
    usleep(1000000);
    if (rank == 1) {
        cout << rank << " Barrier 2\n" << flush;
        MPI_Barrier(MPI_COMM_WORLD);
    }
    cout << rank << " Barrier 3\n" << flush;
    MPI_Barrier(MPI_COMM_WORLD);
    cout << rank << " end\n" << flush;
    usleep(1000000);
    MPI_Finalize();
    return 0;
}

产生:

$ mpiexec -n 2 ./a.out 
0 Barrier 1
1 Some output 
0 Some output 
1 Barrier 2
1 Barrier 3
0 Barrier 3
0 end
1 end

$ mpiexec -n 3 ./a.out 
2 Some output 
0 Barrier 1
1 Some output 
0 Some output 
1 Barrier 2
1 Barrier 3
2 Barrier 3
2 end
0 Barrier 3
^Cmpiexec: killing job...

或者,查看以下 C++11 代码中的时间戳:

#include <cstdint>
#include <chrono>
#include <mpi.h>
#include <iostream>
using namespace std;

inline unsigned long int time(void) { 
    return std::chrono::high_resolution_clock::now().time_since_epoch().count(); 
}

int main(int argc, char **argv)
{
    MPI_Init(&argc, &argv);
    int rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    if (rank == 0) {
        MPI_Barrier(MPI_COMM_WORLD);
    }
    cout << rank << " " << time() << " Some output\n";
    if (rank == 1) {
        MPI_Barrier(MPI_COMM_WORLD);
    }
    MPI_Barrier(MPI_COMM_WORLD);
    cout << rank << " " << time() << " end\n";
    MPI_Finalize();
    return 0;
}

输出:

$ mpiexec -n 2 ./a.out 
0 1464100768220965374 Some output
0 1464100768221002105 end
1 1464100768220902046 Some output
1 1464100768221000693 end

按时间戳排序:

$ mpiexec -n 2 ./a.out 
1 1464100768220902046 Some output
0 1464100768220965374 Some output
1 1464100768221000693 end
0 1464100768221002105 end

结论是 Barrier 的行为符合预期,打印语句不一定会告诉您这一点。

编辑:2016-05-24 显示程序行为的详细分析。