在复制 wc 命令的 C 程序中使用两个缓冲区

Question

我有以下代码模拟来自 linux 的 wc 命令。我需要使用尺寸为 4096 的缓冲区，但由于某种原因，当我执行此代码时，我得到以下结果：

0 0 0 wcfile

即使文件不为空，我也得到 0 行、字和字节。我使用的代码如下：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#define LUNG_BUF 4096

int main(int argc, char** argv)
{
int bytes = 0;
int words = 0;
int newLine = 0;

char buffer[LUNG_BUF];
enum states { WHITESPACE, WORD };
int state = WHITESPACE; 
 if ( argc !=2 )
 {
     printf( "Nu ati introdu snumele  fisierului\n%s", argv[0]);
 }
 else{
     FILE *file = fopen( argv[1], "r");

   if(file == 0){
      printf("can not find :%s\n",argv[1]);
   }
   else{
            char *thefile = argv[1];

       while (read(fileno(file),buffer,LUNG_BUF) ==1 )
      {
         bytes++;
         if ( buffer[0]== ' ' || buffer[0] == '\t'  )
         {
            state = WHITESPACE;
         }
         else if (buffer[0]=='\n')
         {
            newLine++;
            state = WHITESPACE;
         }
         else 
         {
            if ( state == WHITESPACE )
            {
               words++;
            }
            state = WORD;
         }

      }        
      printf("%d %d %d %s\n",newLine,words,bytes,thefile);        
   }
 } 

}```

Answer 1

read 尝试将最多 LUNG_BUF 字节读入缓冲区和 returns 实际读取的字节数（或在文件结束时为零或 -1 表示错误） .

这意味着 == 1 的检查大多数时候都会失败。

如果您想解释数据，读取比最大缓冲区大小少一个字节以便能够在缓冲区末尾放置终止 NUL 字节也是有意义的。

然后您可以评估此数据，例如，通过在每次循环通过时使用设置为缓冲区开头的指针。

那么您的代码将如下所示：

size_t n;
while ((n = read(fileno(file), buffer, LUNG_BUF - 1)) > 0) {
    buffer[n] = '[=10=]';
    char *ptr = buffer;
    while (*ptr) {
        bytes++;
        if (*ptr == ' ' || *ptr == '\t') {
            state = WHITESPACE;
        } else if (*ptr == '\n') {
            newLine++;
            state = WHITESPACE;
        } else {
            if (state == WHITESPACE) {
                words++;
            }
            state = WORD;
        }
        ptr++;
    }
}

另一种选择是使用 fgets，它提供一行或最多 4095 字节的数据（因为 fgets 至少附加了一个终止 NUL 字节），如果该行实际上更长的话。

所以你的循环只是稍微修改一下，看起来像这样：

while (fgets(buffer, sizeof(buffer), file)) {
    char *ptr = buffer;
    while (*ptr) {
    ...

在复制 wc 命令的 C 程序中使用两个缓冲区

using two buffers in C program that replicates the wc command

c

shell

buffer

wc