MPI 中的 ISend 和 Recv:收到不同的值?
ISend & Recv in MPI: Different value received?
在我的矩阵加法代码中,我使用 ISend 和 Tag 1 将下限传输到其他进程,但是当我编译代码时,所有其他从属进程都声称具有相同的下限。我不明白为什么?
输出:
I am process 1 and I received 1120 as lower bound
I am process 1 and my lower bound is 1120 and my upper bound is 1682
I am process 2 and I received 1120 as lower bound
I am process 2 and my lower bound is 1120 and my upper bound is 1682
Process 0 here: I am sending lower bound 0 to process 1
Process 0 here: I am sending lower bound 560 to process 2
Process 0 here: I am sending lower bound 1120 to process 3
Timings : 13.300698 Sec
I am process 3 and I received 1120 as lower bound
I am process 3 and my lower bound is 1120 and my upper bound is 1682
代码:
#define N_ROWS 1682
#define N_COLS 823
#define MASTER_TO_SLAVE_TAG 1 //tag for messages sent from master to slaves
#define SLAVE_TO_MASTER_TAG 4 //tag for messages sent from slaves to master
void readMatrix();
int rank, nproc, proc;
double matrix_A[N_ROWS][N_COLS];
double matrix_B[N_ROWS][N_COLS];
double matrix_C[N_ROWS][N_COLS];
int low_bound; //low bound of the number of rows of [A] allocated to a slave
int upper_bound; //upper bound of the number of rows of [A] allocated to a slave
int portion; //portion of the number of rows of [A] allocated to a slave
MPI_Status status; // store status of a MPI_Recv
MPI_Request request; //capture request of a MPI_Isend
int main (int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
double StartTime = MPI_Wtime();
// -------------------> Process 0 initalizes matrices and sends work portions to other processes
if (rank==0) {
readMatrix();
for (proc = 1; proc < nproc; proc++) {//for each slave other than the master
portion = (N_ROWS / (nproc - 1)); // calculate portion without master
low_bound = (proc - 1) * portion;
if (((proc + 1) == nproc) && ((N_ROWS % (nproc - 1)) != 0)) {//if rows of [A] cannot be equally divided among slaves
upper_bound = N_ROWS; //last slave gets all the remaining rows
} else {
upper_bound = low_bound + portion; //rows of [A] are equally divisable among slaves
}
//send the low bound first without blocking, to the intended slave
printf("Process 0 here: I am sending lower bound %i to process %i \n",low_bound,proc);
MPI_Isend(&low_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &request);
//next send the upper bound without blocking, to the intended slave
MPI_Isend(&upper_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &request);
//finally send the allocated row portion of [A] without blocking, to the intended slave
MPI_Isend(&matrix_A[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, proc, MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &request);
}
}
//broadcast [B] to all the slaves
MPI_Bcast(&matrix_B, N_ROWS*N_COLS, MPI_DOUBLE, 0, MPI_COMM_WORLD);
// -------------------> Other processes do their work
if (rank != 0) {
//receive low bound from the master
MPI_Recv(&low_bound, 1, MPI_INT, 0, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &status);
printf("I am process %i and I received %i as lower bound \n",rank,low_bound);
//next receive upper bound from the master
MPI_Recv(&upper_bound, 1, MPI_INT, 0, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &status);
//finally receive row portion of [A] to be processed from the master
MPI_Recv(&matrix_A[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, 0, MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &status);
printf("I am process %i and my lower bound is %i and my upper bound is %i \n",rank,low_bound,upper_bound);
//do your work
for (int i = low_bound; i < upper_bound; i++) {
for (int j = 0; j < N_COLS; j++) {
matrix_C[i][j] = (matrix_A[i][j] + matrix_B[i][j]);
}
}
//send back the low bound first without blocking, to the master
MPI_Isend(&low_bound, 1, MPI_INT, 0, SLAVE_TO_MASTER_TAG, MPI_COMM_WORLD, &request);
//send the upper bound next without blocking, to the master
MPI_Isend(&upper_bound, 1, MPI_INT, 0, SLAVE_TO_MASTER_TAG + 1, MPI_COMM_WORLD, &request);
//finally send the processed portion of data without blocking, to the master
MPI_Isend(&matrix_C[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, 0, SLAVE_TO_MASTER_TAG + 2, MPI_COMM_WORLD, &request);
}
// -------------------> Process 0 gathers the work
...
MPI_Isend()
开始非阻塞发送。因此,修改发送的缓冲区而不检查消息是否实际发送会导致发送错误的值。
这就是您提供的代码片段中发生的情况,在进程 for (proc = 1; proc < nproc; proc++)
的循环中
proc=1 : low_bound
被计算出来。
proc=1 : low_bound
被发送(非阻塞)到进程 1.
proc=2 : low_bound
已修改。邮件已损坏。
存在不同的解决方案:
使用阻塞发送MPI_Send()
.
通过创建包含 3 个请求的数组 MPI_Request requests[3]; MPI_Status statuses[3];
检查消息是否完成,使用非阻塞发送并使用 MPI_Waitall()
检查请求的完成。
MPI_Isend(&low_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &requests[0]);
MPI_Isend(..., &requests[1]);
MPI_Isend(..., &requests[2]);
MPI_Waitall(3, requests, statuses);
看看MPI_Scatter()
and MPI_Scatterv()
!
"usual" 方法是 MPI_Bcast()
矩阵的大小。然后每个进程计算其矩阵部分的大小。进程 0 计算 MPI_Scatterv()
.
所需的 sendcounts
和 displs
在我的矩阵加法代码中,我使用 ISend 和 Tag 1 将下限传输到其他进程,但是当我编译代码时,所有其他从属进程都声称具有相同的下限。我不明白为什么?
输出:
I am process 1 and I received 1120 as lower bound
I am process 1 and my lower bound is 1120 and my upper bound is 1682
I am process 2 and I received 1120 as lower bound
I am process 2 and my lower bound is 1120 and my upper bound is 1682
Process 0 here: I am sending lower bound 0 to process 1
Process 0 here: I am sending lower bound 560 to process 2
Process 0 here: I am sending lower bound 1120 to process 3
Timings : 13.300698 Sec
I am process 3 and I received 1120 as lower bound
I am process 3 and my lower bound is 1120 and my upper bound is 1682
代码:
#define N_ROWS 1682
#define N_COLS 823
#define MASTER_TO_SLAVE_TAG 1 //tag for messages sent from master to slaves
#define SLAVE_TO_MASTER_TAG 4 //tag for messages sent from slaves to master
void readMatrix();
int rank, nproc, proc;
double matrix_A[N_ROWS][N_COLS];
double matrix_B[N_ROWS][N_COLS];
double matrix_C[N_ROWS][N_COLS];
int low_bound; //low bound of the number of rows of [A] allocated to a slave
int upper_bound; //upper bound of the number of rows of [A] allocated to a slave
int portion; //portion of the number of rows of [A] allocated to a slave
MPI_Status status; // store status of a MPI_Recv
MPI_Request request; //capture request of a MPI_Isend
int main (int argc, char *argv[]) {
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
double StartTime = MPI_Wtime();
// -------------------> Process 0 initalizes matrices and sends work portions to other processes
if (rank==0) {
readMatrix();
for (proc = 1; proc < nproc; proc++) {//for each slave other than the master
portion = (N_ROWS / (nproc - 1)); // calculate portion without master
low_bound = (proc - 1) * portion;
if (((proc + 1) == nproc) && ((N_ROWS % (nproc - 1)) != 0)) {//if rows of [A] cannot be equally divided among slaves
upper_bound = N_ROWS; //last slave gets all the remaining rows
} else {
upper_bound = low_bound + portion; //rows of [A] are equally divisable among slaves
}
//send the low bound first without blocking, to the intended slave
printf("Process 0 here: I am sending lower bound %i to process %i \n",low_bound,proc);
MPI_Isend(&low_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &request);
//next send the upper bound without blocking, to the intended slave
MPI_Isend(&upper_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &request);
//finally send the allocated row portion of [A] without blocking, to the intended slave
MPI_Isend(&matrix_A[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, proc, MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &request);
}
}
//broadcast [B] to all the slaves
MPI_Bcast(&matrix_B, N_ROWS*N_COLS, MPI_DOUBLE, 0, MPI_COMM_WORLD);
// -------------------> Other processes do their work
if (rank != 0) {
//receive low bound from the master
MPI_Recv(&low_bound, 1, MPI_INT, 0, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &status);
printf("I am process %i and I received %i as lower bound \n",rank,low_bound);
//next receive upper bound from the master
MPI_Recv(&upper_bound, 1, MPI_INT, 0, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &status);
//finally receive row portion of [A] to be processed from the master
MPI_Recv(&matrix_A[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, 0, MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &status);
printf("I am process %i and my lower bound is %i and my upper bound is %i \n",rank,low_bound,upper_bound);
//do your work
for (int i = low_bound; i < upper_bound; i++) {
for (int j = 0; j < N_COLS; j++) {
matrix_C[i][j] = (matrix_A[i][j] + matrix_B[i][j]);
}
}
//send back the low bound first without blocking, to the master
MPI_Isend(&low_bound, 1, MPI_INT, 0, SLAVE_TO_MASTER_TAG, MPI_COMM_WORLD, &request);
//send the upper bound next without blocking, to the master
MPI_Isend(&upper_bound, 1, MPI_INT, 0, SLAVE_TO_MASTER_TAG + 1, MPI_COMM_WORLD, &request);
//finally send the processed portion of data without blocking, to the master
MPI_Isend(&matrix_C[low_bound][0], (upper_bound - low_bound) * N_COLS, MPI_DOUBLE, 0, SLAVE_TO_MASTER_TAG + 2, MPI_COMM_WORLD, &request);
}
// -------------------> Process 0 gathers the work
...
MPI_Isend()
开始非阻塞发送。因此,修改发送的缓冲区而不检查消息是否实际发送会导致发送错误的值。
这就是您提供的代码片段中发生的情况,在进程 for (proc = 1; proc < nproc; proc++)
proc=1 :
low_bound
被计算出来。proc=1 :
low_bound
被发送(非阻塞)到进程 1.proc=2 :
low_bound
已修改。邮件已损坏。
存在不同的解决方案:
使用阻塞发送
MPI_Send()
.通过创建包含 3 个请求的数组
MPI_Request requests[3]; MPI_Status statuses[3];
检查消息是否完成,使用非阻塞发送并使用MPI_Waitall()
检查请求的完成。MPI_Isend(&low_bound, 1, MPI_INT, proc, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &requests[0]); MPI_Isend(..., &requests[1]); MPI_Isend(..., &requests[2]); MPI_Waitall(3, requests, statuses);
看看
MPI_Scatter()
andMPI_Scatterv()
!
"usual" 方法是 MPI_Bcast()
矩阵的大小。然后每个进程计算其矩阵部分的大小。进程 0 计算 MPI_Scatterv()
.
sendcounts
和 displs