open STDIN/STDOUT 如何正确处理和使用 utf8 编码?
How open STDIN/STDOUT handles and work with utf8 encoding correctly?
我的代码中有 utf8 字符。所以我这样做:
use utf8;
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line; # Wide character in print at ...
然后我想我的STDOUT应该在utf8
:
use utf8;
use open IO => ':utf8 :std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line; # Wide character in print at ...
为什么当我说 perl 使用 utf8
而我的源代码有 utf8
个字符时我得到错误?
同时:
没有错误:
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line;
没有错误:
use open IO => ':utf8 :std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line;
我应该如何打开文件句柄并正确使用 utf8
?
UPD
其实我有这个代码。它不匹配:
use open IO => ':utf8 :std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match = $line =~ m/(вiд|от|від)/i;
print "$line -> \n";
很遗憾,正则表达式不匹配。输出为:
ЗГ. РАХ. №382 ВIД 03.02.2020Р ->
然后我添加utf8
pragma:
use utf8;
use open IO => ':utf8 :std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match = $line =~ m/(вiд|от|від)/i;
print "$line -> \n";
现在匹配正则表达式,但发出警告
Wide character in print at t2.pl line 17.
ЗГ. РАХ. №382 ВIД 03.02.2020Р -> ВIД
在 IRC 中感谢@Grinnz
下一个代码有效:
use utf8;
use open ':encoding(UTF-8)', ':std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match = $line =~ m/(вiд|от|від)/i;
print "$line -> \n";
注意事项:
@Grinnz 建议使用 https://metacpan.org/pod/open::layers
因为 :std is not a layer, it must be its own argument in the list
我也不应该使用 :utf8
because
CAUTION: Do not use this layer to translate from UTF-8 bytes, as invalid UTF-8 or binary data will result in malformed Perl strings. It is unlikely to produce invalid UTF-8 when used for output, though it will instead produce UTF-EBCDIC on EBCDIC systems. The :encoding(UTF-8) layer (hyphen is significant) is preferred as it will ensure translation between valid UTF-8 bytes and valid Unicode characters.
我的代码中有 utf8 字符。所以我这样做:
use utf8;
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line; # Wide character in print at ...
然后我想我的STDOUT应该在utf8
:
use utf8;
use open IO => ':utf8 :std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line; # Wide character in print at ...
为什么当我说 perl 使用 utf8
而我的源代码有 utf8
个字符时我得到错误?
同时:
没有错误:
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line;
没有错误:
use open IO => ':utf8 :std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
print $line;
我应该如何打开文件句柄并正确使用 utf8
?
UPD
其实我有这个代码。它不匹配:
use open IO => ':utf8 :std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match = $line =~ m/(вiд|от|від)/i;
print "$line -> \n";
很遗憾,正则表达式不匹配。输出为:
ЗГ. РАХ. №382 ВIД 03.02.2020Р ->
然后我添加utf8
pragma:
use utf8;
use open IO => ':utf8 :std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match = $line =~ m/(вiд|от|від)/i;
print "$line -> \n";
现在匹配正则表达式,但发出警告
Wide character in print at t2.pl line 17.
ЗГ. РАХ. №382 ВIД 03.02.2020Р -> ВIД
在 IRC 中感谢@Grinnz
下一个代码有效:
use utf8;
use open ':encoding(UTF-8)', ':std';
my $line = 'ЗГ. РАХ. №382 ВIД 03.02.2020Р';
my @match = $line =~ m/(вiд|от|від)/i;
print "$line -> \n";
注意事项:
@Grinnz 建议使用 https://metacpan.org/pod/open::layers
因为 :std is not a layer, it must be its own argument in the list
我也不应该使用 :utf8
because
CAUTION: Do not use this layer to translate from UTF-8 bytes, as invalid UTF-8 or binary data will result in malformed Perl strings. It is unlikely to produce invalid UTF-8 when used for output, though it will instead produce UTF-EBCDIC on EBCDIC systems. The :encoding(UTF-8) layer (hyphen is significant) is preferred as it will ensure translation between valid UTF-8 bytes and valid Unicode characters.