PThread 的分布式计算不工作

Question

我有一个用 C 编写的程序，用于 class 我正在参加。 objective 是获取一个数组并使用 10 个不同的线程计算某个字符串的所有实例。在这种情况下，我们试图计算文件中有多少“是”。应该是 55.

我目前的逻辑是将数组拆分为每个单词，然后单独处理每个单词，并包含一个测试以查看它应该在当前线程还是另一个线程上处理。

如有任何帮助，我们将不胜感激。我的讲师提供了除以下之外的所有代码： num_substring、allowedOnThread 和计数器。我做了这些。

我对 C 还很陌生

我目前有：

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define MAX 10240
#define NUM_THREADS  10

int n1,n2;
char *s1,*s2;
FILE *fp;
int countArray[NUM_THREADS]={0};

int total = 0;

//read input file and generate string s1/s2 and length n1/n2
int readf(FILE *fp)
{
    if((fp=fopen("strings.txt", "r"))==NULL){
        printf("ERROR: can't open string.txt!\n");
        return 0;
    }
    s1=(char *)malloc(sizeof(char)*MAX);
    if(s1==NULL){
        printf("ERROR: Out of memory!\n");
        return -1;
    }
    s2=(char *)malloc(sizeof(char)*MAX);
    if(s1==NULL){
        printf("ERROR: Out of memory\n");
        return -1;
    }
    /*read s1 s2 from the file*/
    s1=fgets(s1, MAX, fp);
    s2=fgets(s2, MAX, fp);
    n1=strlen(s1);  /*length of s1*/
    n2=strlen(s2)-1; /*length of s2*/

    if(s1==NULL || s2==NULL || n1<n2)  /*when error exit*/
        return -1;
    return 0;
}

int num_substring(int t) {
//add your logic here
//1, how to distribute different parts of string s1 into different threads
//2, how to sum up the total number of substring from all threads
    
    char *str = s1; // This is what we will need to process for the string
    
    char *token = strtok(str, " ");
    
    int count = 0;
    
    int index = 0;
    while(token != NULL) {
        // Determine if it should be on the thread
        
        if(allowedOnThread(t, index) == 1){
            count = count + counter(token);
        }else{
            return count;
        }
        index++;
        token = strtok(NULL, " ");
    }

    return count;
}

int allowedOnThread(int thread, int index) {
    int threadMultiplier = n1 / NUM_THREADS;
    
    // Check range
    int min = thread * threadMultiplier;
    int max = (thread * threadMultiplier) + threadMultiplier;
    
    if(thread >= 1){
        min = (thread * threadMultiplier) + 1;
    }
    
    if(thread + 1 == NUM_THREADS){
        max = n1 + 1;
    }
    
    if(min <= index && index <= max) {
        return 1;
    }
    
    
    return 0;
}

int counter(char *str){
    int i,j,k;
    int count;

    int complete = 0;
    
    for (i = 0; i <= (n1-n2); i++){   
        count=0;
        for(j = i,k = 0; k < n2; j++,k++){  /*search for the next string of size of n2*/  
            if (*(str+j)!=*(s2+k)){
                break;
            }else{
                count++;
            }

            if(count==n2){  
                complete++;
            }                       
        }
    }
    
    total = total + complete;
    
    return complete;
}


void *calSubStringThread(void *threadid){
    long tid = (long)threadid;
    printf("This is thread %ld, ", tid);
    int num = num_substring(tid);
    printf("find num of is: %d\n", num);
    pthread_exit(NULL);
}

int main(int argc, char *argv[])
{
    pthread_t threads[NUM_THREADS];
    int t, rc;

    readf(fp);
    
    for(t=0; t<NUM_THREADS; t++){
        rc = pthread_create(&threads[t], NULL, calSubStringThread, (void *) (size_t)t);
        if (rc){
            printf("ERROR; return code from pthread_create() is %d\n", rc);
            exit(-1);
        }
    }

    for(t=0; t<NUM_THREADS; t++){
        pthread_join(threads[t], NULL);
    }

    printf("The number of substrings is: %d\n", total);
    return 1;
}

输入文件：

Thss is an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. This is an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss ss. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That is a kiwi fruit. This is an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. This is a banana. This is a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ssss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. This is a banana. This is a berry. That is cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. This is a banana. This is a berry. This is cherry. That ss a haw. Thss ss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ssss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. This is a banana. This is a berry. This is cherry. That is a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss ss an avocado. There ss a peach on the tree. This is a banana. This is a berry. This is cherry. This is a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree This is a banana. This is a berry. This is cherry. This is a haw. Thss is a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. This is a banana. This is a berry. This is cherry. This is a haw. This is a lemon. There ss a hickory on the tree. Thss ss an apple. That ss a pear. That ss an orange. That ss a kiwi fruit. Thss ss an avocado. There ss a peach on the tree. Thss ss a banana. That ss a berry. That ss cherry. That ss a haw. Thss ss a lemon. There ss a hickory on the
is

在此先感谢您。我已经为此工作了好几个小时。

Answer 1

这是我的热门评论。

我不得不重构一下你的代码。

我还必须修改一些提供的函数以允许调试 printf

根据您想要的 55 结果，您 can/should 使用 strstr 而不是 strtok。这是我计算正确的唯一方法。

我添加了一个互斥锁，这样对 total 的更新就不会被线程冲突破坏。

关键是使用我提到的 struct 重写 allowedOnThread。为每个线程计算每个段的开始和结束偏移量，调整前后的空格，使单词不在中间被切碎。

听说是重构后的代码。是有注释的。它允许每个线程单独计算其范围。

该代码给出了正确的答案，但停在每个片段的 end 上似乎是正确的，但我可能会仔细检查一下。

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include <stdatomic.h>

#define MAX 10240
#define NUM_THREADS  10

int n1, n2;
char *s1, *s2;
FILE *fp;
int countArray[NUM_THREADS] = { 0 };

pthread_mutex_t mutex;

int total = 0;

const char *delims = ",. \t\n";

struct range {
    size_t off;                         // starting offset
    size_t end;                         // ending offset (one past last char)
};

__thread int curtid;
__thread FILE *logxf;
#ifdef DEBUG
#define dbgprt(_fmt...) \
    _dbgprt(_fmt)
#else
#define dbgprt(_fmt...) \
    do { \
    } while (0)
#endif

#define prt(_lvl) \
    __attribute__((__format__(__printf__,_lvl,_lvl + 1)))

void prt(1)
_dbgprt(const char *fmt,...)
{
    char *bp;
    char buf[1000];
    va_list ap;

    if (logxf == NULL) {
        char logf[100];
        sprintf(logf,"log%2.2d",curtid);
        logxf = fopen(logf,"w");
        setlinebuf(logxf);
    }

    fprintf(logxf,"[%d] ",curtid);

    va_start(ap,fmt);
    bp += vfprintf(logxf,fmt,ap);
    va_end(ap);
}

//read input file and generate string s1/s2 and length n1/n2
int
readf(FILE * fp)
{
    if ((fp = fopen("strings.txt", "r")) == NULL) {
        printf("ERROR: can't open string.txt!\n");
        return 0;
    }
    s1 = (char *) malloc(sizeof(char) * MAX);
    if (s1 == NULL) {
        printf("ERROR: Out of memory!\n");
        return -1;
    }
    s2 = (char *) malloc(sizeof(char) * MAX);
    if (s1 == NULL) {
        printf("ERROR: Out of memory\n");
        return -1;
    }
    // read s1 s2 from the file

    s1 = fgets(s1, MAX, fp);
    s2 = fgets(s2, MAX, fp);
    // length of s1
    n1 = strlen(s1);
    // length of s2
    n2 = strlen(s2) - 1;

    // when error exit
    if (s1 == NULL || s2 == NULL || n1 < n2)
        return -1;
    return 0;
}

size_t
skip_to_delim(size_t off,const char *tag)
{
    char *str;

    dbgprt("skip_to_delim: ENTER off=%zu tag=%s\n",off,tag);

    str = &s1[off];
    off += strcspn(str,delims);

    dbgprt("skip_to_delim: EXIT off=%zu\n",off);

    return off;
}

int
allowedOnThread(int thread, struct range *seg)
{
    int threadMultiplier = n1 / NUM_THREADS;

    dbgprt("allowedOnThread: ENTER thread=%d\n",thread);

    // get starting offset
    do {
        seg->off = threadMultiplier * thread;

        // first thread always starts at offset 0
        if (thread == 0)
            break;

        // skip past a word and stop on a delimiter
        seg->off = skip_to_delim(seg->off,"off");
    } while (0);

    // get ending offset/length
    do {
        if (thread == (NUM_THREADS - 1)) {
            seg->end = n1;
            break;
        }

        // scan at least the amount we're allocated
        seg->end = seg->off + threadMultiplier;

        // skip past a word and stop on a delimiter
        seg->end = skip_to_delim(seg->end,"end");
    } while (0);

    dbgprt("allowedOnThread: EXIT thread=%d off=%zu end=%zu\n",
        thread,seg->off,seg->end);

    return 0;
}

int
num_substring(int t)
{
    //add your logic here
    //1, how to distribute different parts of string s1 into different threads
    //2, how to sum up the total number of substring from all threads

    dbgprt("num_substring: ENTER\n");

    struct range seg;
    allowedOnThread(t,&seg);

    char *str = &s1[seg.off];
    char *end = &s1[seg.end];

    char *token = str;
    size_t count = 0;

    // NOTE/FIXME -- this should be double checked to ensure that we're not
    // double counting by going beyond our range
    while (1) {
        // look for a substring match of s2 in s1
        token = strstr(token,s2);
        if (token == NULL)
            break;

        // don't intrude on next thread's segment
        if (token >= end)
            break;

        // advance the count
        count += 1;

        // point to start of next possible match point for s2
        token += n2;

        // stop when we go beyond the end of our thread's area
        if (token >= end)
            break;
    }

    // add to global count (under thread lock)
    pthread_mutex_lock(&mutex);
    total += count;
    pthread_mutex_unlock(&mutex);

    dbgprt("num_substring: EXIT count=%zu\n",count);

    return count;
}

void *
calSubStringThread(void *threadid)
{
    long tid = (long) threadid;

    curtid = tid + 1;

    dbgprt("calSubstringThread: ENTER\n");

    int num = num_substring(tid);

    dbgprt("calSubstringThread: EXIT num=%d\n",num);

    pthread_exit(NULL);
}

// docheck -- check with non-threaded algorithm
void
docheck(void)
{
    size_t count = 0;

    char *token = s1;
    while (1) {
        token = strstr(token,s2);
        if (token == NULL)
            break;

        count += 1;

        token += n2;
    }

    printf("docheck: count=%zu\n",count);
}

int
main(int argc, char *argv[])
{
    pthread_t threads[NUM_THREADS];
    int t, rc;

    pthread_mutex_init(&mutex,NULL);

    readf(fp);

    // get rid of newline
    s2[n2] = 0;
    dbgprt("main: s2='%s'\n",s2);

    docheck();

    for (t = 0; t < NUM_THREADS; t++) {
        rc = pthread_create(&threads[t], NULL, calSubStringThread,
            (void *) (size_t) t);
        if (rc) {
            printf("ERROR; return code from pthread_create() is %d\n", rc);
            exit(-1);
        }
    }

    for (t = 0; t < NUM_THREADS; t++) {
        pthread_join(threads[t], NULL);
    }

    printf("The number of substrings is: %d\n", total);

    return 0;
}

这是调试日志输出。（请注意，我根据 ENTER/EXIT 消息手动缩进了日志）。

==> log00 <==
[0] main: s2='is'

==> log01 <==
[1] calSubstringThread: ENTER
  [1] num_substring: ENTER
    [1] allowedOnThread: ENTER thread=0
      [1] skip_to_delim: ENTER off=479 tag=end
      [1] skip_to_delim: EXIT off=479
    [1] allowedOnThread: EXIT thread=0 off=0 end=479
  [1] num_substring: EXIT count=1
[1] calSubstringThread: EXIT num=1

==> log02 <==
[2] calSubstringThread: ENTER
  [2] num_substring: ENTER
    [2] allowedOnThread: ENTER thread=1
      [2] skip_to_delim: ENTER off=479 tag=off
      [2] skip_to_delim: EXIT off=479
      [2] skip_to_delim: ENTER off=958 tag=end
      [2] skip_to_delim: EXIT off=960
    [2] allowedOnThread: EXIT thread=1 off=479 end=960
  [2] num_substring: EXIT count=2
[2] calSubstringThread: EXIT num=2

==> log03 <==
[3] calSubstringThread: ENTER
  [3] num_substring: ENTER
    [3] allowedOnThread: ENTER thread=2
      [3] skip_to_delim: ENTER off=958 tag=off
      [3] skip_to_delim: EXIT off=960
      [3] skip_to_delim: ENTER off=1439 tag=end
      [3] skip_to_delim: EXIT off=1440
    [3] allowedOnThread: EXIT thread=2 off=960 end=1440
  [3] num_substring: EXIT count=3
[3] calSubstringThread: EXIT num=3

==> log04 <==
[4] calSubstringThread: ENTER
  [4] num_substring: ENTER
    [4] allowedOnThread: ENTER thread=3
      [4] skip_to_delim: ENTER off=1437 tag=off
      [4] skip_to_delim: EXIT off=1440
      [4] skip_to_delim: ENTER off=1919 tag=end
      [4] skip_to_delim: EXIT off=1920
    [4] allowedOnThread: EXIT thread=3 off=1440 end=1920
  [4] num_substring: EXIT count=4
[4] calSubstringThread: EXIT num=4

==> log05 <==
[5] calSubstringThread: ENTER
  [5] num_substring: ENTER
    [5] allowedOnThread: ENTER thread=4
      [5] skip_to_delim: ENTER off=1916 tag=off
      [5] skip_to_delim: EXIT off=1920
      [5] skip_to_delim: ENTER off=2399 tag=end
      [5] skip_to_delim: EXIT off=2402
    [5] allowedOnThread: EXIT thread=4 off=1920 end=2402
  [5] num_substring: EXIT count=5
[5] calSubstringThread: EXIT num=5

==> log06 <==
[6] calSubstringThread: ENTER
  [6] num_substring: ENTER
    [6] allowedOnThread: ENTER thread=5
      [6] skip_to_delim: ENTER off=2395 tag=off
      [6] skip_to_delim: EXIT off=2396
      [6] skip_to_delim: ENTER off=2875 tag=end
      [6] skip_to_delim: EXIT off=2876
    [6] allowedOnThread: EXIT thread=5 off=2396 end=2876
  [6] num_substring: EXIT count=6
[6] calSubstringThread: EXIT num=6

==> log07 <==
[7] calSubstringThread: ENTER
  [7] num_substring: ENTER
    [7] allowedOnThread: ENTER thread=6
      [7] skip_to_delim: ENTER off=2874 tag=off
      [7] skip_to_delim: EXIT off=2876
      [7] skip_to_delim: ENTER off=3355 tag=end
      [7] skip_to_delim: EXIT off=3356
    [7] allowedOnThread: EXIT thread=6 off=2876 end=3356
  [7] num_substring: EXIT count=7
[7] calSubstringThread: EXIT num=7

==> log08 <==
[8] calSubstringThread: ENTER
  [8] num_substring: ENTER
    [8] allowedOnThread: ENTER thread=7
      [8] skip_to_delim: ENTER off=3353 tag=off
      [8] skip_to_delim: EXIT off=3356
      [8] skip_to_delim: ENTER off=3835 tag=end
      [8] skip_to_delim: EXIT off=3835
    [8] allowedOnThread: EXIT thread=7 off=3356 end=3835
  [8] num_substring: EXIT count=8
[8] calSubstringThread: EXIT num=8

==> log09 <==
[9] calSubstringThread: ENTER
  [9] num_substring: ENTER
    [9] allowedOnThread: ENTER thread=8
      [9] skip_to_delim: ENTER off=3832 tag=off
      [9] skip_to_delim: EXIT off=3832
      [9] skip_to_delim: ENTER off=4311 tag=end
      [9] skip_to_delim: EXIT off=4311
    [9] allowedOnThread: EXIT thread=8 off=3832 end=4311
  [9] num_substring: EXIT count=9
[9] calSubstringThread: EXIT num=9

==> log10 <==
[10] calSubstringThread: ENTER
  [10] num_substring: ENTER
    [10] allowedOnThread: ENTER thread=9
      [10] skip_to_delim: ENTER off=4311 tag=off
      [10] skip_to_delim: EXIT off=4311
    [10] allowedOnThread: EXIT thread=9 off=4311 end=4799
  [10] num_substring: EXIT count=10
[10] calSubstringThread: EXIT num=10

Thank you very much for explaining and helping me with this. I was wondering if you could explain to me how mutex works and why it's good practice to use it? – Bass Approved

这，从 C 语法来看，似乎像一个 atomic 操作：

total += count;

但是，不是。实际上是三个操作：

temp = total;
temp += count;
total = temp;

不同的线程会依次执行这些。通常（例如 99.44% 的时间），这三个操作将由一个线程执行，而不会受到另一个线程的干扰。如果我们有两个线程（例如 A 和 B），“好的”顺序是线程操作“很好地”排序：

thread A / cpu 0        thread B / cpu 1
--------------------    ------------------------
tempA = total;
tempA += countA;
total = tempA;
                        tempB = total;
                        tempB += countB;
                        total = tempB;

total 的最终值将是：total + countA + countB，这就是我们想要的。

但是，如果两个线程运行同时在不同的CPU上，它们可能会穿插这些操作。我们可以有一个序列，例如：

thread A / cpu 0        thread B / cpu 1
--------------------    ------------------------
tempA = total;
                        tempB = total;
tempA += countA;
                        tempB += countB;
total = tempA;
                        total = tempB;

在这种情况下，在序列的末尾，total 的最终值将是：total + countB [这不是我们想要的 ]. （即）total 增加 countA [由线程 A] 执行将是 lost/trashed!

在这种情况下，线程 正在比赛 并且线程 B “赢得”了比赛。

使用互斥锁[或其他锁定机制或使用原子操作]将防止这种情况。

A pthread_mutex_lock [松散地] 是两个操作：“请求”和“授予”。如果互斥不持有，这些操作会同时发生。如果互斥量被 [由另一个线程] 持有，则“授予”会及时延迟。它在另一个线程完成“释放”（例如 pthread_mutex_unlock）后被授予。这是时间表：

thread A / cpu 0        thread B / cpu 1
--------------------    ------------------------
mutex requested
mutex granted
                        mutex requested
tempA = total;
tempA += countA;
total = tempA;
mutex released
                        mutex granted
                        tempB = total;
                        tempB += countB;
                        total = tempB;
                        mutex released

更详细的解释见我的回答：

另一种保证原子更新的方法是使用stdatomic.h原语。看我的回答：

另一种解决方案是“票证锁定”。看我的回答：

PThread 的分布式计算不工作

Distributed Computing with PThread Not Working

c

linux

arrays

pthreads

distributed-computing