MPI_Reduce 和 MPI_MIN 是如何工作的？

Question

如果我有这个代码：

int main(void) {
    int result=0;
    int num[6] = {1, 2, 4, 3, 7, 1};
    if (my_rank != 0) {
        MPI_Reduce(num, &result, 6, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD);
    } else {
        MPI_Reduce(num, &result, 6, MPI_INT, MPI_MIN, 0, MPI_COMM_WORLD)
        printf("result = %d\n", result);
    }
}

结果打印为1；

但是如果num[0]=9；那么结果是 9

我读到要解决这个问题，我必须将变量 num 定义为数组。我无法理解函数 MPI_Reduce 如何与 MPI_MIN 一起使用。为什么，如果num[0]不等于最小的数，那么我必须将变量num定义为数组？

Answer 1

MPI_Reduce 对通信器的成员执行缩减 - 而不是本地数组的成员。 sendbuf 和 recvbuf 必须相同 size.

我认为 the standard 说得最好：

Thus, all processes provide input buffers and output buffers of the same length, with elements of the same type. Each process can provide one element, or a sequence of elements, in which case the combine operation is executed element-wise on each entry of the sequence.

MPI 无法获取数组中所有元素的最小值，您必须手动执行此操作。

Answer 2

您可以使用MPI_MIN 来获取通过reduction 传递的最小值。让我们检查函数声明：

int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype
                datatype, MPI_Op op, int root, MPI_Comm comm)

每个进程使用缓冲区 sendbuff 发送它的值（或值数组）。由 root id 标识的进程接收缓冲区并将它们存储在缓冲区 recvbuf 中。从每个其他进程接收的元素数在计数中指定，因此 recvbuff 必须分配维度 sizeof(datatype)*count。如果每个进程只有一个整数要发送 (count = 1) 那么 recvbuff 它也是一个整数，如果每个进程有两个整数那么 recvbuff 它是一个大小为 2 的整数数组。看到这个不错 post 进一步的解释和漂亮的图片。

现在应该清楚你的代码是错误的，sendbuff和recvbuff必须是相同的大小并且不需要条件：if(myrank==0)。简单地说，recvbuff 仅对 root 过程有意义，而 sendbuff 对其他过程有意义。在您的示例中，您可以将数组的一个或多个元素分配给不同的进程，然后计算最小值（如果进程与数组中的值一样多）或最小值数组（如果值多于进程）。

这是一个工作示例，说明了 MPI_MIN、MPI_MAX 和 MPI_SUM（从 this 稍作修改）的用法，在简单值的情况下 (不是数组）。每个进程根据它们的级别做一些工作，并将完成工作所花费的时间发送给根进程。根进程收集时间并输出时间的最小值、最大值和平均值。

#include <stdio.h>
#include <mpi.h>

int myrank, numprocs;

/* just a function to waste some time */
float work()
{
    float x, y;
    if (myrank%2) {
        for (int i = 0; i < 100000000; ++i) {
            x = i/0.001;
            y += x;
        }
    } else {
        for (int i = 0; i < 100000; ++i) {
            x = i/0.001;
            y += x;
        }
    }    
    return y;
}

int main(int argc, char **argv)
{
    int node;

    MPI_Init(&argc,&argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &node);

    printf("Hello World from Node %d\n",node);

   /*variables used for gathering timing statistics*/
    double mytime,   
           maxtime,
           mintime,
           avgtime;

    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Barrier(MPI_COMM_WORLD);  /*synchronize all processes*/

    mytime = MPI_Wtime();  /*get time just before work section */
    work();
    mytime = MPI_Wtime() - mytime;  /*get time just after work section*/

    /*compute max, min, and average timing statistics*/
    MPI_Reduce(&mytime, &maxtime, 1, MPI_DOUBLE,MPI_MAX, 0, MPI_COMM_WORLD);
    MPI_Reduce(&mytime, &mintime, 1, MPI_DOUBLE, MPI_MIN, 0,MPI_COMM_WORLD);
    MPI_Reduce(&mytime, &avgtime, 1, MPI_DOUBLE, MPI_SUM, 0,MPI_COMM_WORLD);

    /* plot the output */
    if (myrank == 0) {
        avgtime /= numprocs;
        printf("Min: %lf  Max: %lf  Avg:  %lf\n", mintime, maxtime,avgtime);
    }

    MPI_Finalize();

    return 0;
}

如果我运行在我的 OSX 笔记本电脑上，这就是我得到的：

urcaurca$ mpirun -n 4 ./a.out
Hello World from Node 3
Hello World from Node 0
Hello World from Node 2
Hello World from Node 1
Min: 0.000974  Max: 0.985291  Avg:  0.493081

MPI_Reduce 和 MPI_MIN 是如何工作的？

How does MPI_Reduce with MPI_MIN work?

c

mpi