在 Perl 正则表达式和 grep 中否定括号中的字符类

Question

我正在尝试解决一个非常简单的问题 - 在数组中查找仅包含特定字母的字符串。但是，我运行反对正则表达式 and/or grep 的行为，但我不明白。

#!/usr/bin/perl

use warnings;
use strict;

my @test_data = qw(ant bee cat dodo elephant frog giraffe horse);

# Words wanted include these letters only. Hardcoded for demonstration purposes
my @wanted_letters = qw/a c d i n o t/;

# Subtract those letters from the alphabet to find the letters to eliminate.
# Interpolate array into a negated bracketed character class, positive grep
# against a list of the lowercase alphabet: fine, gets befghjklmpqrsuvwxyz.
my @unwanted_letters = grep(/[^@wanted_letters]/, ('a' .. 'z'));

# The desired result can be simulated by hardcoding the unwanted letters into a
# bracketed character class then doing a negative grep: matches ant, cat, and dodo.
my @works = grep(!/[befghjklmpqrsuvwxyz]/, @test_data);

# Doing something similar but moving the negation into the bracketed character
# class fails and matches everything.
my @fails1 = grep(/[^befghjklmpqrsuvwxyz]/, @test_data);

# Doing the same thing that produced the array of unwanted letters also fails.
my @fails2 = grep(/[^@unwanted_letters]/, @test_data);

print join ' ', @works; print "\n";
print join ' ', @fails1; print "\n";
print join ' ', @fails2; print "\n";

问题：

为什么 @works 得到正确的结果而不是 @fails1？ grep docs suggest the former, and the negation section of perlrecharclass 建议使用后者，尽管它在其示例中使用了 =~。这与使用 grep 有关吗？
为什么 @fails2 不起作用？这与数组与列表上下文有关吗？它在其他方面看起来与减法步骤相同。
除此之外，是否有避免减法步骤的纯正则表达式方法？

Answer 1

fails 都通过添加锚点 ^ 和 $ 以及量词 +

来固定

这两个都有效：

my @fails1 = grep(/^[^befghjklmpqrsuvwxyz]+$/, @test_data);
my @fails2 = grep(/^[^@unwanted_letters]+$/, @test_data);

请记住，/[^befghjklmpqrsuvwxyz]/ 或 /[^@unwanted_letters]/ 仅匹配一个字符。添加 + 意味着尽可能多。添加^和$表示字符串从头到尾的所有字符。

使用 /[@wanted_letters]/，如果只有一个想要的字符（即使字符串中有不需要的字符），您将 return 匹配 - 逻辑等同于 any. Compare to /^[@wanted_letters]+$/ where all the letters need to be in the set of @wanted_letters and is the equivalent of all.

Demo1 只有一个字符，所以 grep 失败。

Demo2 量词表示不止一个但没有锚点 - grep 失败

Demo3 锚点和量词 - 预期结果。

一旦你了解 character classes only match ONE character and anchors for the WHOLE string and quantifiers 将匹配扩展到锚点的所有内容，你可以直接用想要的字母进行 grep：

my @wanted = grep(/^[@wanted_letters]+$/, @test_data);

Answer 2

您正在匹配字符串中任何字符集之外的内容。但它仍然可以在字符串中其他地方的字符集中包含字符。例如，如果测试词是 elephant，否定字符 class 匹配 a 字符。

如果你想测试整个字符串，你需要量化它并锚定到末端。

grep(/^[^befghjklmpqrsuvwxyz]*$/, @test_data);

翻译成英文，就是“word contains no characters in the set”和“word contains a character not in the set”的区别。

在 Perl 正则表达式和 grep 中否定括号中的字符类

Negating bracketed character classes in Perl regular expressions and grep

regex

arrays

perl

在 Perl 正则表达式和 grep 中否定括号中的字符 类

Negating bracketed character classes in Perl regular expressions and grep

regex

arrays

perl

在 Perl 正则表达式和 grep 中否定括号中的字符类