使用 MPI IO 读取文件时出现错误值

Wrong values when reading file with MPI IO

这是一个简单的 C 程序,它与 MPI IO:

并行读取文件
#include <stdio.h>
#include <stdlib.h>

#include "mpi.h"

#define N 10

main( int argc, char **argv )
{
    int rank, size;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );

    int i0 = N *  rank / size;
    int i1 = N * (rank+1) / size;
    printf("rank: %d, i0: %d, i1: %d\n", rank, i0, i1);

    int i;
    double* data = malloc( (i1-i0)*sizeof(double) );
    for (i = 0 ; i < i1-i0 ; i++)
        data[i] = 123.;

    MPI_File f;
    MPI_File_open(MPI_COMM_WORLD, "data.bin", MPI_MODE_RDONLY, 
                  MPI_INFO_NULL, &f);

    MPI_File_set_view(f, i0, MPI_DOUBLE, MPI_DOUBLE, "native",
                      MPI_INFO_NULL);

    MPI_Status status;
    MPI_File_read(f, data, i1-i0, MPI_DOUBLE, &status);

    int count;
    MPI_Get_count(&status, MPI_DOUBLE, &count);
    printf("rank %d, %d value read\n", rank, count);

    for (i = 0 ; i < i1-i0 ; i++) {
        printf("rank: %d index: %d value: %.2f\n", rank, i, data[i]);
    }

    MPI_File_close(&f);

    MPI_Finalize();

    free(data);

    return 0;
}

有一个进程:

./read_mpi_io

读取的值正确:

rank: 0, i0: 0, i1: 10
rank 0, 10 value read
rank: 0 index: 0 value: 0.00
rank: 0 index: 1 value: 1.00
rank: 0 index: 2 value: 2.00
rank: 0 index: 3 value: 3.00
rank: 0 index: 4 value: 4.00
rank: 0 index: 5 value: 5.00
rank: 0 index: 6 value: 6.00
rank: 0 index: 7 value: 7.00
rank: 0 index: 8 value: 8.00
rank: 0 index: 9 value: 9.00

但是有两个进程:

mpirun -n 2 ./read_mpi_io

我得到错误的值(零):

rank: 0, i0: 0, i1: 5
rank: 1, i0: 5, i1: 10
rank 0, 5 value read
rank: 0 index: 0 value: 0.00
rank 1, 5 value read
rank: 1 index: 0 value: 0.00
rank: 0 index: 1 value: 1.00
rank: 0 index: 2 value: 2.00
rank: 1 index: 1 value: 0.00
rank: 1 index: 2 value: 0.00
rank: 1 index: 3 value: 0.00
rank: 1 index: 4 value: 0.00
rank: 0 index: 3 value: 3.00
rank: 0 index: 4 value: 4.00

我的 C 代码有什么问题?

您在调用 MPI_File_set_view() 时遇到的问题:它是第二个参数,即视图开始的偏移量,预计以字节为单位,而不是元素数量。所以在这里,您需要将 i0 参数乘以您要读取的元素的大小,即 sizeof(double).

将相应行替换为:

MPI_File_set_view( f, i0 * sizeof( double ), MPI_DOUBLE, MPI_DOUBLE,
                   "native", MPI_INFO_NULL );

刚刚解决了问题并使代码按预期工作。