从文件中读入一个单词并与正则表达式进行比较

Question

我正在构建一个程序，该程序假设从一个连续有 2 个元音并以 ly 或 ing 结尾的文件中查找单词。我目前在处理从文件中读取单词的方式时遇到了一些问题。我当前的代码看起来有点像这样

fgets(string, BUFF_SIZE, file);
char *ptr = strtok(string, delim);
reti = regcomp(&regex, "[aoueiyAOUEIY]+[aoueiyAOUEIY].{0,}(ly|ing|LY|ING)$", REG_EXTENDED);
if (reti){
   fprintf(stderr, "Could not compile regex\n");
   exit(1);
}
/* Execute regular expression */

reti = regexec(&regex, ptr , 0, NULL, 0);
if (!reti) {
   puts("Match");
  printf(" %s\n", string);
}
else if (reti == REG_NOMATCH) {
   puts("No match");
   printf(" %s\n", string);
}
else {
   regerror(reti, &regex, msgbuf, sizeof(msgbuf));
   fprintf(stderr, "Regex match failed: %s\n", msgbuf);
   exit(1);
}

我知道我需要某种循环以便我可以检查一个以上的单词，我想尝试 strtok 是如何工作的，但意识到我仍然面临同样的问题。例如，如果我的线路相当稳定。跳？希望！一个词可以结束的只有很多 "chars"，我如何让我的 delim 明白它在一个词的结尾。我正在考虑做第二个只有字母的正则表达式并进行比较，直到我得到一个不匹配的 reg。但问题是缓冲区会很快变满。

Answer 1

对于这样的任务，定义 "what is a word" 很重要。

例如考虑"bad!idea this!is"是4个词"bad"、"idea""this""is"还是4个词"bad!" , "idea" "this!" "is" 还是只是 "bad!idea" "this!is".

这两个词

如果输入是 "bad3idea this9is" 怎么办？

有时标准函数（例如 strtok、fscanf）会满足您的需要，在这种情况下您应该使用它们。

如果标准函数不适合，您可以使用 fgetc 来实现满足您需要的功能。

下面的示例会将不是字母（即不是 a-z 或 A-Z）的任何内容视为单词分隔符。

int end_of_file = 0;
while(!end_of_file)
{
    int index = 0;
    int c = fgetc(file);
    if (c == EOF) break;  // Done with the file
    while (isalpha(c))
    {
        string[index] = c;
        ++index;
        if (index == BUFF_SIZE)
        {
            // oh dear, the buffer is too small
            //
            // Just end the program... 
            exit(1);
        }
        c = fgetc(file);
        if (c == EOF)
        {
            end_of_file = 1;
            break;
        }
    }
    string[index] = '[=10=]';
    if (index >= 4)       // We need at least 4 chars for a match
    {
        // do the regex stuff
    }
}

从文件中读入一个单词并与正则表达式进行比较

Read in one word from a file and comparing with a regular expression

c

regex

file

line