处理 strcpy(string, "") 时未正确清空和分配字符串

Question

编辑：我确实尝试将行 arr_of_strings[arr_index_count] = first_word; 更改为 strcpy(arr_of_strings[arr_index_count], first_word); 但是在打印 Word is: This

后出现分段错误

编辑 2：我尝试在没有 strtok 的情况下执行此操作，因为我认为这是了解 C 字符串的好方法。

正在尝试自学 C。决定创建一个函数，它接受一个字符串，并将字符串中的每个单词放入数组中的一个元素中。这是我的代码：

假设#define MAX_LENGTH = 80

// char *string_one[unknown_size];

// first_word will represent each word in the sentence
char first_word[MAX_LENGTH + 1] = "";

// this is the array I will store each word in
char *arr_of_strings[MAX_LENGTH];

int index_count = 0;
int arr_index_count = 0;

char sentence[] = "This is a sentence.";

for (int i = 0; i<MAX_LENGTH; i++) {
    printf("Dealing with char: %c\n", sentence[i]); 

    if (sentence[i] == '[=10=]') {
        // end of sentence
        break;
    } else if (sentence[i] ==  ' ') {
        // this signifies the end of a word
        printf("Word is: %s\n", first_word);
        arr_of_strings[arr_index_count] = first_word;
        // after putting the word in the string, make the word empty again
        strcpy(first_word, "");
        // verify that it is empty
        printf("First word is now: %s\n", first_word);

        index_count = 0;
        arr_index_count++;
    } else {
        // not the start of a new string... so keep appending the letter to first_word
        printf("Letter to put in first_word is: %c\n", sentence[i]);
        first_word[index_count] = sentence[i];
        index_count++;
    }
}

printf("-----------------\n");
for (int j = 0; j<=arr_index_count; j++) {
    printf("%s\n", arr_of_strings[j]);
}

打印出来的是：

Dealing with char: T
Letter to put in first_word is: T
Dealing with char: h
Letter to put in first_word is: h
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: This
First word is now: 
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: isis
First word is now: 
Dealing with char: a
Letter to put in first_word is: a
Dealing with char:  
Word is: asis
First word is now: 
Dealing with char: s
Letter to put in first_word is: s
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: n
Letter to put in first_word is: n
Dealing with char: t
Letter to put in first_word is: t
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: n
Letter to put in first_word is: n
Dealing with char: c
Letter to put in first_word is: c
Dealing with char: e
Letter to put in first_word is: e
Dealing with char: .
Letter to put in first_word is: .
Dealing with char: 
-----------------
sentence.
sentence.
sentence.

如果我们看这里：

First word is now: 
Dealing with char: i
Letter to put in first_word is: i
Dealing with char: s
Letter to put in first_word is: s
Dealing with char:  
Word is: isis

为什么word是空的，我们把i和s放进去，word现在是isis？（与 asis 相同）。
为什么sentence这个词打印了3次？我的算法显然有缺陷，但如果有的话，单词 sentence 不应该打印 4 次（句子中的每个单词一次：This is a sentence）吗？

另外，我刚学C，所以如果有任何其他改进算法的方法，请告诉我。

Answer 1

arr_of_strings只是一个char指针数组，然后你把所有的词都指向数组first_word。此外，您不使用 C 字符串所需的空终止符。

这是一种可能对您有所帮助的方法，它使用 strtok:

#include <string.h>
#include <stdio.h>

#define N 100
#define LEN 20 // max length of a word

int fill(char matrix[N][LEN], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    char * pch;
    // Splits 'data' to tokens, separated by a whitespace
    pch = strtok (data," ");
    while (pch != NULL)
    {
        // Copy a word to the correct row of 'matrix'
        strcpy(matrix[counter++], pch);
        //printf ("%s\n",pch);
        pch = strtok (NULL, " ");
    }
    return counter;
}

void print(char matrix[N][LEN], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

int main(void)
{
    char data[] = "New to the C programming language";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char matrix[N][LEN] = {0};
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    return 0;
}

输出：

New
to
the
C
programming
language

Answer 2

1) 发生这种情况是因为您在打印之前没有在单词末尾添加“\0”。在您的程序遇到第一个 space first_word 后，看起来像这样 {'T', 'h', 'i', 's', '[=11=]', '[=11=]', ...} 并且打印出来就好了。调用 strcpy(first_word, "") 将其更改为 {'[=13=]', 'h', 'i', 's', '[=13=]', ...} 然后读取下一个单词 "is" 会覆盖字符串的前两个字符，从而导致 {'i', 's', 'i', 's', '[=14=]', ...} 因此 first_word 现在是string "isis" 如输出所示。这可以通过在打印字符串之前简单地添加 first_word[index_count] = '[=16=]' 来解决。

2.1) 这个数组在每个索引中包含相同字符串的原因是因为你的字符串数组 arr_of_strings 是一个字符串指针数组，最终都指向相同的字符串 first_word 将包含循环结束时句子的最后一个词。这可以通过几种方法解决，其中一种方法是使 arr_of_strings 成为像 char arr_of_strings[MAX_STRINGS][MAX_LENGTH] 这样的二维数组，然后您可以使用 strcpy(arr_of_strings[arr_index_count], first_word)

将字符串添加到该数组中

2.2) 最后，它只打印 3 次 "sentence." 的原因是因为您只检查 space 来表示单词的结尾。 "sentence." 以空终止符 '\0' 结尾，因此它永远不会添加到单词数组中，并且输出也没有一行 "Word is: sentence."

Answer 3

Trying to do this without strtok since I figured it would be a good way to learn about C strings.

是的，就是这种精神！

我已经在我之前的回答中解释了你的代码的一些问题，所以现在我要post一个无strtok的解决方案，这一定会帮助你理解字符串是怎么回事。将使用基本的指针算法。

专业提示：使用一张纸画出数组（data 和 matrix），注意它们的计数器的值，以及运行程序在那篇论文中。

代码：

#include <string.h>
#include <stdio.h>

#define N 100
#define LEN 20 // max length of a word

int fill(char matrix[N][LEN], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    // Array to store current word
    char word[LEN];
    // Counter 'i' for 'word'
    int i;
    // Wihle there is still something to read from 'data'
    while(*data != '[=10=]')
    {
        // We seek a new word
        i = 0;
        // While the current character of 'data' is not a whitespace or a null-terminator
        while(*data != ' ' && *data != '[=10=]')
            // copy that character to word, and increment 'i'. Move to the next character of 'data'.
            word[i++] = *data++;
        // Null-terminate 'word'. 'i' is already at the value we desire, from the line above.
        word[i] = '[=10=]';
        // If the current of 'data' is not a null-terminator (thus it's a whitespace)
        if(*data != '[=10=]')
            // Increment the pointer, so that we skip the whitespace (and be ready to read the next word)
            data++;
        // Copy the word to the counter-th row of the matrix, and increment the counter
        strcpy(matrix[counter++], word);
    }

    return counter;
}

void print(char matrix[N][LEN], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

int main(void)
{
    char data[] = "Alexander the Great";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char matrix[N][LEN] = {0};
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    return 0;
}

输出：

Alexander
the
Great

代码的要点在于函数fill()，它需要data和：

找到一个词。
将该单词一个字符一个字符地存储到名为 word 的数组中。
将该词复制到 matrix。

棘手的部分是找到单词。您需要遍历字符串并在遇到空格时停止，这表明我们在该次迭代中读取的每个字符实际上都是单词的字母。

但是，在搜索字符串的最后一个单词时需要小心，因为当您到达该点时不会遇到空格。因此，您应该小心到达字符串的末尾；换句话说：空终止符。

当你这样做时，复制矩阵中的最后一个单词就完成了，但要确保正确更新指针（这是我给你的论文想法对理解有很大帮助的地方）。

Answer 4

基于我的无 strtok ，我编写了一些使用 N 字符指针数组的代码，而不是硬编码的 2D 矩阵。

char matrix[N][LEN] 是一个二维数组，最多可以存储 N 个字符串，其中每个字符串的最大长度可以是 LEN。 char *ptr_arr[N] 是一个 N 字符指针数组。所以它最多可以存储N个字符串，但是每个字符串的长度没有定义。

当前的方法通过为每个字符串分配所需的内存来为我们节省一些 space。使用硬编码的二维数组，您可以为任何字符串使用相同的内存；因此，如果您假设一个字符串的长度可以是 20，那么您将分配一个大小为 20 的内存块，而不管您存储的是什么字符串，它的大小可能比 20 小得多，或者 - 甚至更糟 - 大得多.在后一种情况下，您需要截断字符串，或者如果代码编写不仔细，则通过超出存储字符串的数组范围来调用 Undefined Behavior。

使用指针方法，我们无需担心这一点，并且可以为每个字符串分配我们需要的数量 space，但一如既往，存在折衷。我们能够做到这一点并节省一些 space，但我们需要 动态地 分配内存（并在完成后取消分配；没有垃圾收集器C，例如 Java）。动态分配是一个强大的工具，但需要我们花费更多的开发时间。

因此，在我的示例中，我们将遵循与之前相同的逻辑（关于我们如何从字符串中找到单词等），但我们会小心地将单词存储在矩阵中。

一旦找到一个单词并将其存储在临时数组 word 中，我们就可以使用 strlen() 找出该单词的确切长度。我们将根据单词的长度动态分配所有 C 字符串应该具有的 space 加上空终止符 1（因为 <string.h> 取决于它来找到一个结尾字符串).

因此，为了存储第一个词 "Alexander"，我们需要做：

ptr_arr[0] = malloc(sizeof(char) * (9 + 1));

其中 9 是 strlen("Alexander") 的结果。请注意，我们要求的内存块大小等于 char 的大小乘以 10。char 的大小为 1，因此在这种情况下它不会进行任何更改，但是通常你应该使用它（因为你可能想要其他数据类型甚至结构等）。

我们让数组的第一个指针指向我们刚刚动态分配的那个内存块。现在这个内存块属于我们，因此我们可以在其中存储数据（在我们的例子中是单词）。我们用 strcpy().

来做到这一点

然后我们继续打印单词。

现在我们完成了，例如，在 Python 中，您将完成为您的程序编写代码。但是现在，由于我们动态分配内存，我们需要free()它！这是人们常犯的错误；忘记释放他们要求的内存！

我们通过释放每个指向由 malloc() 返回的内存的指针来做到这一点。所以如果我们调用 malloc() 10 次，那么 free() 也应该调用 10 次——否则会发生内存泄漏！

废话少说，代码如下：

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

#define N 100

int fill(char* ptr_arr[N], char* data)
{
    // How many words in 'data'?
    int counter = 0;
    // Array to store current word, assuming max length will be 50
    char word[50];
    // Counter 'i' for 'word'
    int i;
    // Wihle there is still something to read from 'data'
    while(*data != '[=11=]')
    {
        // We seek a new word
        i = 0;
        // While the current character of 'data' is not a whitespace or a null-terminator
        while(*data != ' ' && *data != '[=11=]')
            // copy that character to word, and increment 'i'. Move to the next character of 'data'.
            word[i++] = *data++;
        // Null-terminate 'word'. 'i' is already at the value we desire, from the line above.
        word[i] = '[=11=]';
        // If the current of 'data' is not a null-terminator (thus it's a whitespace)
        if(*data != '[=11=]')
            // Increment the pointer, so that we skip the whitespace (and be ready to read the next word)
            data++;
        // Dynamically allocate space for a word of length `strlen(word)`
        // plus 1 for the null terminator. Assign that memory chunk to the
        // pointer positioned at `ptr_arr[counter]`.
        ptr_arr[counter] = malloc(sizeof(char) * (strlen(word) + 1));
        // Now, `ptr_arr[counter]` points to a memory block, that will
        // store the current word.

        // Copy the word to the counter-th row of the ptr_arr, and increment the counter
        strcpy(ptr_arr[counter++], word);
    }

    return counter;
}

void print(char* matrix[N], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       printf("%s\n", matrix[i]);
}

void free_matrix(char* matrix[N], int words_no)
{
   for(int i = 0; i < words_no; ++i)
       free(matrix[i]);
}

int main(void)
{
    char data[] = "Alexander the Great";
    // We will store each word of 'data' to a matrix, of 'N' rows and 'LEN' columns
    char *matrix[N];
    int words_no;
    // 'fill()' populates 'matrix' with 'data' and returns the number of words contained in 'data'.
    words_no = fill(matrix, data);
    print(matrix, words_no);
    free_matrix(matrix, words_no);
    return 0;
}

输出：

Alexander
the
Great

处理 strcpy(string, "") 时未正确清空和分配字符串

String not properly being emptied and assigned when dealing with strcpy(string, "")

c

string

function

c-strings