调试分布式 OR 操作

Debugging distributed OR operation

我正在尝试对存储在分布式系统的不同节点中的整数数组执行按位运算。计算结果后我想将结果分发到每个节点。为此,我尝试使用 MPI_Allreduce 操作。但是我收到以下代码的运行时错误。

#include <bits/stdc++.h>
#include <mpi.h>
#include <unistd.h>
using namespace std;

int main (int argc, char* argv[]) {
    int numtasks, taskid, n=200, i;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
    MPI_Comm_rank(MPI_COMM_WORLD, &taskid);

    int *arr;
    arr = new int[n];
    for(i=0; i<n; i++) arr[i] = 0;
    for(i=taskid; i<n; i+=numtasks) arr[i] = 1;

    MPI_Allreduce(arr, arr, n, MPI_INT, MPI_BOR, MPI_COMM_WORLD);

    if(taskid == 0){
        for(i=0; i<n; i++) printf("%d ", arr[i]);
        printf("\n");
    }

    MPI_Finalize();
    return 0;
}

程序在 n 为 1 时运行正常,但当 n>1 时,在运行时出现以下错误。

[user:17026] An error occurred in MPI_Allreduce
[user:17026] on communicator MPI_COMM_WORLD
[user:17026] MPI_ERR_BUFFER: invalid buffer pointer
[user:17026] MPI_ERRORS_ARE_FATAL: your MPI job will now abort


mpirun has exited due to process rank 2 with PID 17028 on node user exiting improperly. There are two reasons this could occur:

  1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination.

  2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here).


[user:17025] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[user:17025] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

我想知道 MPI_Allreduce 是否适用于 n>1,因为互联网上提供的大多数示例仅将 n 设为 1。如果我的方法完全错误,请为我的问题提出更好的解决方案。

我不知道 MPI 库,但是从 MPI_Allreduce 文档:

int MPI_Allreduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)

我猜 "const void *sendbuf" 和 "void *recvbuf" HAVE 是不同的数组,我想在高度并行化的计算中更是如此。

如果库检查两个指针​​是否在同一地址或重叠,我不会感到惊讶,因此消息:

[user:17026] MPI_ERR_BUFFER: invalid buffer pointer

所以代码应该是这样的:

#include <bits/stdc++.h>
#include <mpi.h>
#include <unistd.h>
using namespace std;

int main (int argc, char* argv[]) {
    int numtasks, taskid, n=200, i;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
    MPI_Comm_rank(MPI_COMM_WORLD, &taskid);

    int *arr_send = new int[n];
    int *arr_recv = new int[n];
    for(i=0; i<n; i++) arr_send[i] = 0;
    for(i=taskid; i<n; i+=numtasks) arr_send[i] = 1;

    MPI_Allreduce(arr_send, arr_recv, n, MPI_INT, MPI_BOR, MPI_COMM_WORLD);

    if(taskid == 0){
        for(i=0; i<n; i++) printf("%d ", arr_recv[i]);
        printf("\n");
    }

    MPI_Finalize();
    return 0;
}

和原来的差别不大。让我知道这是否适合你。

如果要使用相同的缓冲区进行发送和接收,可以指定MPI_IN_PLACE作为发送缓冲区。

MPI_Allreduce(MPI_IN_PLACE, arr, n, MPI_INT, MPI_BOR, MPI_COMM_WORLD);

注意:这仅适用于内部通信器。 MPI_COMM_WORLD 是内部通讯器。如果你不知道什么是互通器,那么你的通讯器很可能就是内通器。