StreamReader 中断特殊字符

Question

我正在尝试使用 fs.

在服务器上的 meteor-app 中读取文件

我的目标：
我想处理一个非常大的文件。因此我需要逐行阅读它以保持内存使用率平稳。

我的做法：
我正在创建一个 streamReader 并为每个字符处理文件，将它保存到一个新字符串，直到我得到一个 \n，然后将它传递给一个 processLine(line) 函数。

我的测试文件：

F1;F2
12;abäde

我的代码：

我已经评论了超出问题范围的所有内容。无论如何张贴它以防万一有人对我有完全不同的方式。

const fs = require('fs');

// ...

let streamReader = fs.createReadStream(path, { highWaterMark: 1});

let line = "";
streamReader.on('data', function(chunk) {
    console.log(chunk)
    // line += chunk;
    // if (chunk == "\n") {
    //     processLine(line);
    //     line = "";
    // }
});

streamReader.on('end', function() {
    processLine(line);
});

processLine = (line) => {
    console.log(line);
}

以上代码的输出：

F
1
;
F
2



1
2
;
a
b
�
�
d
e

要么文档说默认编码是 utf8，字符 ä 打印为 �。

指定编码时的输出如下：

fs.createReadStream(path, { highWaterMark: 1, encoding: "utf8 }

F
1
;
F
2



1
2
;
a
b

到达ä就断了。我认为发生这种情况是因为它需要 2 个块来表示该字符。

我只是不知道如何绕过它。一般来说，我只需要逐行处理它。可能我走错路了？

Answer 1

高水位线的微小值不会节省大量 RAM；无论如何，默认值类似于 32k。而且，尝试使用高水位线来强制执行 old-timey getchar() 操作是滥用它。

There's a readline object in core node.js. It accepts output from a stream and splits it into lines. The documentation offers some samples。这是改编自示例未调试。

const fs = require('fs')
const readline = require('readline')

const rl = readline.createInterface(
   {
          input: fs.createReadStream(path),
      crlfDelay: Infinity
   })

rl.on('line', function (line) {
  console.log(`A line: ${line}`);
})

rl.on('close', function () {
  /* file completely processed */
} )

交互式命令行也很方便input/output，但你在这里不关心它。

StreamReader 中断特殊字符

StreamReader breaking on special character

encoding

character-encoding

fs

node.js