C 中的标记化字符串文字数组

Question

我正在编写一个 C 程序来标记输入文本文件并跟踪单词长度的频率，同时跟踪和存储相应的单词本身。我的字数统计工作正常，但无法让我的 word_tracker 数组正确存储字符串：

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#define MAX_LENGTH 34
#define MAX_WORDS 750

int main(int argc, char *argv[]){ 

    FILE *fp; //input file
    const char *cur; //stores current word as string literal
    char words[MAX_LENGTH*MAX_WORDS]; //stores all words from text file
    char file_name[100]; //stores file name
    int word_count[MAX_LENGTH] = {0}; //array to store frequency of words based on length
    const char *word_tracker[MAX_LENGTH][MAX_WORDS]; //array to store string literals of each word, indexed by char count and 
    int char_count; //current word's char count

    printf("Enter a file name: ");
    scanf("%s", file_name);
    fp = fopen(file_name, "r"); 

    if((fp==NULL)){
        printf("Failure: missing or unopenable file");
        return -1; 
    }else{
        while(fgets(words, sizeof(words), fp)){
            cur= strtok(words, " -.,\b\t\n"); //first word of line
            char_count = strlen(cur);
            word_count[char_count-1] = word_count[char_count-1]+1; //increment frequency of specific word length
            word_tracker[char_count-1][word_count[char_count-1]-1] = cur; //store string into corresponding array index location

            /*test printing*/
            printf("%d:", char_count-1); 
            printf("%s ", word_tracker[char_count-1][(word_count[char_count-1])-1]); 

            while(cur){
                    cur = strtok(NULL, " -.,\b\t\n"); //next word
                    if(cur){
                        char_count = strlen(cur);
                        word_count[char_count-1] = word_count[char_count-1]+1; //increment frequency of specific word length
                        word_tracker[char_count-1][word_count[char_count-1]-1] = cur; //store string into corresponding array index location

                        /*test printing*/
                        printf("%d:", char_count-1); //test print
                        printf("%s ", word_tracker[char_count-1][(word_count[char_count-1])-1]); //test print

                    }
                }
            }
        }
//Testing word_tracker: (This doesn't work)
    printf("\n\n%s \n", word_tracker[0][0]);
    printf("\n%s \n", word_tracker[1][0]);
    printf("%s \n", word_tracker[2][0]);
    printf("%s \n", word_tracker[3][0]);
    printf("%s \n", word_tracker[4][0]);
    printf("%s \n", word_tracker[5][0]);

    return 0;
}

"interior" 测试（在标记化循环内）运行良好，打印了正确的字符串和长度。但是，相对于输入文本文件说他们应该输入的内容，主打印末尾的打印测试看似随机字符串。关于我做错了什么，我有三种理论：

1) 我的索引是错误的

2) 我对如何填充和使用 char* 数组的理解不正确

3) 我的分词循环不正确（cur 不等于 "the isolated string"？）

我注意到在输入文件最后一行所写内容的主要显示变体末尾的测试，所以我认为我的标记化循环可能是错误的。非常感谢任何指导，谢谢！

Answer 1

您的结果数组当前为 const char *word_tracker[MAX_LENGTH][MAX_WORDS]，它是指针的二维数组。您可以 (a) 使用一维指针数组然后为找到的每个单词分配内存，或者 (b) 使用二维字符数组并 strcpy 每个单词位于正确的位置。

所以 (a) 看起来像...

const char *word_tracker[MAX_WORDS];
...
word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS] = strdup(cur);

并且 (b) 看起来像

char word_tracker[MAX_WORDS][MAX_LENGTH];
...
strncpy(word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS], cur, MAX_LENGTH);
word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS][MAX_LENGTH-1] = '[=11=]'

注意，在(b)中，MAX_LENGTH表示字符串的最大长度（即单个单词），因此是第二个索引。 strncpy 确保不超过为单词保留的大小。

C 中的标记化字符串文字数组

Array of tokenized string literals in C

c

arrays

string

pointers

tokenize