为什么 File::Slurp 使用 open ':std', ':encoding(UTF-8)' 时 UTF8 字符错误？

Question

我在 Ubuntu 上有一个 Perl 5.30.0 程序，其中 File::Slurp 和 open ':std', ':encoding(UTF-8)' 的组合导致 UTF8 无法正确读取：

use strict;
use warnings;
use open ':std', ':encoding(UTF-8)';
use File::Slurp;

my $text = File::Slurp::slurp('input.txt');
print "$text\n";

“input.txt”是一个 UTF8 编码的文本文件，内容如下（无 BOM）：

ö

当我运行时，ö 显示为 Ã¶。只有当我删除 use open... 行时，它才会按预期工作并且 ö 打印为 ö.

当我像下面这样手动读取文件时，一切都按预期工作，我确实得到了 ö:

$text = '';
open my $F, '<', "input.txt" or die "Cannot open file: $!";
while (<$F>) {
    $text .= $_;
}
close $F;
print "$text\n";

为什么会这样？去这里的最佳方式是什么？ open pragma 过时了还是我遗漏了什么？

Answer 1

与许多 pragma 一样，^[1] use open 的效果是词法范围的。^[2] 这意味着它只影响找到它的块或文件的其余部分。这样的 pragma 不会影响其范围之外的函数中的代码，即使它们是从其范围内调用的。

您需要将解码流的愿望传达给 File::Slurp。这不能使用 slurp 来完成，但可以使用 read_file 通过它的 binmode 参数来完成。

use open ':std', ':encoding(UTF-8)';  # Still want for effect on STDOUT.
use File::Slurp qw( read_file );

my $text = read_file('input.txt', { binmode => ':encoding(UTF-8)' });

更好的模块是 File::Slurper.

use open ':std', ':encoding(UTF-8)';  # Still want for effect on STDOUT.
use File::Slurper qw( read_text );

my $text = read_text('input.txt');

File::Slurper的read_text默认使用UTF-8解码。

没有模块，你可以使用

use open ':std', ':encoding(UTF-8)';

my $text = do {
   my $qfn = "input.txt";
   open(my $F, '<', $qfn)
      or die("Can't open file \"$file\": $!\n");
   local $/;
   <$fh>
};

当然，这不像早期的解决方案那么清楚。

其他值得注意的示例包括 use VERSION、use strict、use warnings、use feature 和 use utf8。
:std 对 STDIN、STDOUT 和 STDERR 的影响是全局的。

Answer 2

不是你问题的真正答案，但我最近最喜欢的文件 I/O 模块是 Path::Tiny。

use Path::Tiny;
my $text = path('input.txt')->slurp_utf8;

为什么 File::Slurp 使用 open ':std', ':encoding(UTF-8)' 时 UTF8 字符错误？

Why does File::Slurp get UTF8 characters wrong when I use open ':std', ':encoding(UTF-8)';?

perl

utf-8