分段错误,(核心转储)在更改 lpthreads 中的处理器数量时

Segmentation fault, (core dumped) when changing number of processors in lpthreads

我在 运行 我的代码在 8 个处理器上运行时出现分段错误,但它在 1 个和 4 个处理器上工作正常。

我正在使用 lpthread 库,这是我在每个线程中执行的函数。 如果需要更多代码,我可以添加更多。

    void *compute_gauss(void *threadid){

  int local_row, local_norm, col;
  float multiplier;
  long tid;
  tid = (long)threadid;

  fprintf(stdout, "Thread %ld has started\n", tid);

  while (global_norm < N){

    while (global_row < N) {
      pthread_mutex_lock(&global_row_lock);
      local_row = global_row;
      global_row++;
      pthread_mutex_unlock(&global_row_lock);

      print_inputs();
      multiplier = A[local_row][global_norm] / A[global_norm][global_norm];

      for (col = global_norm; col < N; col++) {
        A[local_row][col] -= A[global_norm][col] * multiplier;
      }

      B[local_row] -= B[global_norm] * multiplier;

    }

    pthread_barrier_wait(&barrier);
    if (tid == 0){
      global_norm++;
      global_row=global_norm+1;
    }
    pthread_barrier_wait(&barrier); // wait until all threads arrive
  }
}

这是我初始化障碍的调用函数:

void gauss() {
    int norm, row, col;  /* Normalization row, and zeroing
                          * element row and col */
    int i = 0;
    float multiplier;
    pthread_t threads[procs]; //declared array of threads equal in size to # processors
    global_norm = 0;
    global_row = global_norm+1;

    printf("Computing Parallelized Algorithm.\n");

    pthread_barrier_init(&barrier, NULL, procs);

    /* Gaussian elimination */
    for (i = 0; i < procs; i++){
      pthread_create(&threads[i], NULL, &compute_gauss, (void *)i);
    }

    printf("finished creating threads\n");

    for (i = 0; i < procs; i++){
      pthread_join( threads[i], NULL);
    }

    printf("finished joining threads\n");
    /* (Diagonal elements are not normalized to 1.  This is treated in back
   *    * substitution.)
   *       */

     fprintf(stdout, "pre back substition");
    /* Back substitution */
    for (row = N - 1; row >= 0; row--) {
      X[row] = B[row];
      for (col = N-1; col > row; col--) {
        X[row] -= A[row][col] * X[col];
      }
      X[row] /= A[row][row];
    }
    fprintf(stdout, "post back substitution");
  }

您没有包含足够的代码,因此我无法测试您的程序。但是,我很确定问题在于您没有保护 global_normglobal_rowprint_inputs() 的互斥体。您需要使用互斥量来保护它们,或者您需要使用原子增量运算符。您没有在调试器下看到崩溃,因为它正在改变您的时间安排。

您不应该检查 pthread_barrier_wait 的 return 值并检查 PTHREAD_BARRIER_SERIAL_THREAD 吗?

也不清楚你为什么调用 pthread_barrier_wait 两次。

下面是代码侵入数组的一个例子,如有错误请指出:

// suppose global_row = N - 1;
while (global_row < N) {
    pthread_mutex_lock(&global_row_lock);   // thread 2 waits here, global_row is N - 1;
    local_row = global_row;                 // thread 1 is here, global_row is N - 1;
    global_row++;
    pthread_mutex_unlock(&global_row_lock);

    // when thread 2 goes here, local_row is going to be N, out of array boundary.
    multiplier = A[local_row][global_norm] / A[global_norm][global_norm];