在 Perl 正则表达式和 grep 中否定括号中的字符 类
Negating bracketed character classes in Perl regular expressions and grep
我正在尝试解决一个非常简单的问题 - 在数组中查找仅包含特定字母的字符串。但是,我 运行 反对正则表达式 and/or grep
的行为,但我不明白。
#!/usr/bin/perl
use warnings;
use strict;
my @test_data = qw(ant bee cat dodo elephant frog giraffe horse);
# Words wanted include these letters only. Hardcoded for demonstration purposes
my @wanted_letters = qw/a c d i n o t/;
# Subtract those letters from the alphabet to find the letters to eliminate.
# Interpolate array into a negated bracketed character class, positive grep
# against a list of the lowercase alphabet: fine, gets befghjklmpqrsuvwxyz.
my @unwanted_letters = grep(/[^@wanted_letters]/, ('a' .. 'z'));
# The desired result can be simulated by hardcoding the unwanted letters into a
# bracketed character class then doing a negative grep: matches ant, cat, and dodo.
my @works = grep(!/[befghjklmpqrsuvwxyz]/, @test_data);
# Doing something similar but moving the negation into the bracketed character
# class fails and matches everything.
my @fails1 = grep(/[^befghjklmpqrsuvwxyz]/, @test_data);
# Doing the same thing that produced the array of unwanted letters also fails.
my @fails2 = grep(/[^@unwanted_letters]/, @test_data);
print join ' ', @works; print "\n";
print join ' ', @fails1; print "\n";
print join ' ', @fails2; print "\n";
问题:
- 为什么
@works
得到正确的结果而不是 @fails1
? grep
docs suggest the former, and the negation section of perlrecharclass
建议使用后者,尽管它在其示例中使用了 =~
。这与使用 grep
有关吗?
- 为什么
@fails2
不起作用?这与数组与列表上下文有关吗?它在其他方面看起来与减法步骤相同。
- 除此之外,是否有避免减法步骤的纯正则表达式方法?
fails
都通过添加锚点 ^
和 $
以及量词 +
来固定
这两个都有效:
my @fails1 = grep(/^[^befghjklmpqrsuvwxyz]+$/, @test_data);
my @fails2 = grep(/^[^@unwanted_letters]+$/, @test_data);
请记住,/[^befghjklmpqrsuvwxyz]/
或 /[^@unwanted_letters]/
仅匹配一个字符。添加 +
意味着尽可能多。添加^
和$
表示字符串从头到尾的所有字符。
使用 /[@wanted_letters]/
,如果只有一个想要的字符(即使字符串中有不需要的字符),您将 return 匹配 - 逻辑等同于 any. Compare to /^[@wanted_letters]+$/
where all the letters need to be in the set of @wanted_letters
and is the equivalent of all.
Demo1 只有一个字符,所以 grep
失败。
Demo2 量词表示不止一个但没有锚点 - grep 失败
Demo3 锚点和量词 - 预期结果。
一旦你了解 character classes only match ONE character and anchors for the WHOLE string and quantifiers 将匹配扩展到锚点的所有内容,你可以直接用想要的字母进行 grep:
my @wanted = grep(/^[@wanted_letters]+$/, @test_data);
您正在匹配字符串中任何字符集之外的内容。但它仍然可以在字符串中其他地方的字符集中包含字符。例如,如果测试词是 elephant
,否定字符 class 匹配 a
字符。
如果你想测试整个字符串,你需要量化它并锚定到末端。
grep(/^[^befghjklmpqrsuvwxyz]*$/, @test_data);
翻译成英文,就是“word contains no characters in the set”和“word contains a character not in the set”的区别。
我正在尝试解决一个非常简单的问题 - 在数组中查找仅包含特定字母的字符串。但是,我 运行 反对正则表达式 and/or grep
的行为,但我不明白。
#!/usr/bin/perl
use warnings;
use strict;
my @test_data = qw(ant bee cat dodo elephant frog giraffe horse);
# Words wanted include these letters only. Hardcoded for demonstration purposes
my @wanted_letters = qw/a c d i n o t/;
# Subtract those letters from the alphabet to find the letters to eliminate.
# Interpolate array into a negated bracketed character class, positive grep
# against a list of the lowercase alphabet: fine, gets befghjklmpqrsuvwxyz.
my @unwanted_letters = grep(/[^@wanted_letters]/, ('a' .. 'z'));
# The desired result can be simulated by hardcoding the unwanted letters into a
# bracketed character class then doing a negative grep: matches ant, cat, and dodo.
my @works = grep(!/[befghjklmpqrsuvwxyz]/, @test_data);
# Doing something similar but moving the negation into the bracketed character
# class fails and matches everything.
my @fails1 = grep(/[^befghjklmpqrsuvwxyz]/, @test_data);
# Doing the same thing that produced the array of unwanted letters also fails.
my @fails2 = grep(/[^@unwanted_letters]/, @test_data);
print join ' ', @works; print "\n";
print join ' ', @fails1; print "\n";
print join ' ', @fails2; print "\n";
问题:
- 为什么
@works
得到正确的结果而不是@fails1
?grep
docs suggest the former, and the negation section ofperlrecharclass
建议使用后者,尽管它在其示例中使用了=~
。这与使用grep
有关吗? - 为什么
@fails2
不起作用?这与数组与列表上下文有关吗?它在其他方面看起来与减法步骤相同。 - 除此之外,是否有避免减法步骤的纯正则表达式方法?
fails
都通过添加锚点 ^
和 $
以及量词 +
这两个都有效:
my @fails1 = grep(/^[^befghjklmpqrsuvwxyz]+$/, @test_data);
my @fails2 = grep(/^[^@unwanted_letters]+$/, @test_data);
请记住,/[^befghjklmpqrsuvwxyz]/
或 /[^@unwanted_letters]/
仅匹配一个字符。添加 +
意味着尽可能多。添加^
和$
表示字符串从头到尾的所有字符。
使用 /[@wanted_letters]/
,如果只有一个想要的字符(即使字符串中有不需要的字符),您将 return 匹配 - 逻辑等同于 any. Compare to /^[@wanted_letters]+$/
where all the letters need to be in the set of @wanted_letters
and is the equivalent of all.
Demo1 只有一个字符,所以 grep
失败。
Demo2 量词表示不止一个但没有锚点 - grep 失败
Demo3 锚点和量词 - 预期结果。
一旦你了解 character classes only match ONE character and anchors for the WHOLE string and quantifiers 将匹配扩展到锚点的所有内容,你可以直接用想要的字母进行 grep:
my @wanted = grep(/^[@wanted_letters]+$/, @test_data);
您正在匹配字符串中任何字符集之外的内容。但它仍然可以在字符串中其他地方的字符集中包含字符。例如,如果测试词是 elephant
,否定字符 class 匹配 a
字符。
如果你想测试整个字符串,你需要量化它并锚定到末端。
grep(/^[^befghjklmpqrsuvwxyz]*$/, @test_data);
翻译成英文,就是“word contains no characters in the set”和“word contains a character not in the set”的区别。