Pthread 循环中的段错误

Segmentation Fault in Pthread Loop

最近为了工作做了一堆数值分析。主要针对相对简单概念的少量数据。为了迎接即将到来的项目,我开始研究更复杂的系统,计算量呈指数增长。我的 运行 时间从几十秒变成了几十分钟。为了加快 运行 倍,我决定学习如何使用 pthreads 编写代码。

正因为如此,我一直在研究一个使用串行方法和 pthreads 来填充矩阵的程序。我编写了这个程序来执行这 n 次中的每一次,并取每次 运行 的平均时间。当我 运行 这个程序只使用一个 pthread_t 它按预期工作。当我添加一个额外的线程时,我收到一个 "Segmentation fault" 错误。

我的代码如下:

fill.h

#ifndef FILL_H_
#define FILL_H_

#include <pthread.h>//Allows access to pthreads                                  
#include <sys/time.h>//Allows the ability to pull the system time                
#include <stdio.h>//Allows input and output                                      
#include <stdlib.h>//Allows for several fundamental calls                        

#define NUM_THREADS 2                                                            
#define MAT_DIM 50                                                             
#define RUNS 1      

pthread_t threads[NUM_THREADS];                                              
pthread_mutex_t mutexmat;  

typedef struct{                                                                  
   int id;                                                                      
   int column;                                                                     
   int* matrix[NUM_THREADS];                                                                  
}WORKER;  
#endif

fill.c

/*This routine will fill an array both in serial and parallel with random
 *numbers. It will also display the real time it took to accomplish each task*/

/* C includes */
#include "fill.h"
/* Fills a matrix */
void fill(int start, int stop, int** matrix)
{
    int i, j;
    for(i = start; i < stop; i++)
    {
        for(j = 0; j < MAT_DIM; j++)
            matrix[i][j] = rand() % 10;
    }
}

void* work(void* threadarg)
{
    /* Creates a pointer to a worker type variable*/
    WORKER *this_worker;
    /* Points this_worker at what thread arg is pointing to*/
    this_worker = (WORKER*) threadarg;
    /* Calculates my stopping point for this thread*/
    int stop = this_worker-> column + (MAT_DIM / NUM_THREADS);
    /* Used to drive matrix */
    int i,j;
    /* Fills portion of Matrix */
    for( i = this_worker-> column; i < stop; i++)
    {
        /* Prints the column that matrix is working on */
        printf("Worker %d working on column %d\n", this_worker->id, i);

        for( j = 0; j < MAT_DIM; j++)
        {
            this_worker-> matrix[i][j] = rand() % 10;
        }
    }
    /* Signals thread is done */
    printf("Thread %d done.\n", this_worker-> id);
    /* Terminates thread */
    pthread_exit(NULL);
}

int main()
{
/* Seeding rand */
    srand (time(NULL)); 
/* These will be used for loops */
    int i, j, r, t; 
/* Creating my matrices */
    int* matrix_serial[MAT_DIM];
    int* matrix_thread[MAT_DIM];
/* creating timeval variables */
    struct timeval t0, t1;

/* Beginning serial solution */
    /* Creating timer for serial solution */
    gettimeofday(&t0, 0);
    /* Creating serial matrix */
    for(i = 0; i < MAT_DIM; i++)
        matrix_serial[i] = (int*)malloc( MAT_DIM * sizeof(int));

    /* Filling the matrix */    
    for(r = 0; r < RUNS; r++)
        fill(0, MAT_DIM, matrix_serial);
    /* Calculating how long it took to run */
    gettimeofday(&t1, 0);
    unsigned long long int delta_t = (t1.tv_sec * 1000000 + t1.tv_usec)
                                   - (t0.tv_sec * 1000000 + t0.tv_usec);
    double t_dbl = (double)delta_t/1000000.0;
    double dt_avg = t_dbl / (double)r;
    printf("\nSerial Run Time for %d runs: %f\t Average:%f\n",r, t_dbl, dt_avg);

/* Begin multithread solution */
    /* Creating the offset where each matrix will start */
    int offset = MAT_DIM / NUM_THREADS;
    /* Creating a variable to store a return code */
    int rc;
    /* Creates a WORKER type variable named mat_work_t */
    WORKER mat_work_t[NUM_THREADS];

    /* Allocating a chunk of memory for my matrix */
    for( i = 0; i < MAT_DIM; i++)
        matrix_thread[i] = (int*)malloc( MAT_DIM * sizeof(int));

    /* Begin main loop */
    for(r = 0; r < RUNS; r++)
    {
    /* Begin multithread population of matrix */    
        for(t = 0; t < NUM_THREADS; t++)
        {
    /* Sets the values for mat_work_t[t] */
            mat_work_t[t].id = t;
            mat_work_t[t].column = t * offset;
            /* Points the mat_work_t[t].matrix at the matrix_thread */
            for(i = 0; i < MAT_DIM; i++)
                mat_work_t[t].matrix[i] = &matrix_thread[i][0];

    /* Creates thread placing its return value into rc */
            rc = pthread_create(&threads[t],
                                NULL,
                                work,
                                (void*) &mat_work_t[t]);
    /* Prints that a thread was successfully created */ 
            printf("Thread %d created.\n", mat_work_t[t].id);
    /* Checks to see if a return code was sent. If it was it will print it. */
            if (rc) 
                {
                printf("ERROR: return code from pthread_create() is %d\n", rc);
                return(-1);
                }
        }
    /* Makes sure all threads are done doing work before progressing */
        printf("Waiting for workers to finish.\n");
        for(i = 0; i < NUM_THREADS; i++)
            pthread_join(threads[i], NULL);

        printf("Work complete!\n");

    }

    /* Prints out the last matrix that was created by the loop */
    for(i = 0; i < MAT_DIM; i++)
        {
            for(j = 0; j < MAT_DIM; j++)
                printf("%d ",matrix_thread[i][j]);
            printf("\n");
        }
    /* Terminates thread */
    pthread_exit(NULL);
}

当我 运行 gdb 我得到:

[New Thread 0x7ffff7fd3700 (LWP 27907)]
Thread 0 created.
Worker 0 working on column 0
Worker 0 working on column 1
Worker 0 working on column 2

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fd3700 (LWP 27907)]
0x0000000000400924 in work (threadarg=0x7fffffffd9c0) at fill.c:35
35              this_worker-> matrix[i][j] = rand() % 10;

我对分段错误的理解完全是教科书式的:当您尝试访问 "yours" 无法访问的内存时,就会发生分段错误。由此我知道代码在访问存储此矩阵的内存时遇到问题。

我的问题:

  1. 关于问题的性质,我的逻辑是否正确?
  2. 为什么添加线程会导致这个程序崩溃?
  3. 以后我该如何解决此类问题(如果有任何提示,我们将不胜感激)?
  4. 最后,我该如何修复它(线索或解决方案将不胜感激)?

你确定 struct WORKER 的矩阵大小只是 NUM_THREADS 吗?

您访问的数组超出了您在 2 个地方声明的数组的大小限制。

一个是
在主要功能中 NUM_THREADS(即 2 )实际上与 MAT_DIM (50)

相比太低了
for(i = 0; i < MAT_DIM; i++)
                mat_work_t[t].matrix[i] = &matrix_thread[i][0];

这里是功函数

 for( i = this_worker-> column; i < stop; i++)
    {
        /* Prints the column that matrix is working on */
        printf("Worker %d working on column %d\n", this_worker->id, i);

        for( j = 0; j < MAT_DIM; j++)
        {
            this_worker-> matrix[i][j] = rand() % 10;
        }
    }

在您访问矩阵[1][j] 之前,循环运行良好,当您尝试访问矩阵[2][j] 时,您遇到了分段错误,因为您已将数组大小声明为 2,并且您正在尝试访问第三个(即矩阵[2][j])