在 Perl 正则表达式和 grep 中否定括号中的字符 类

Negating bracketed character classes in Perl regular expressions and grep

我正在尝试解决一个非常简单的问题 - 在数组中查找仅包含特定字母的字符串。但是,我 运行 反对正则表达式 and/or grep 的行为,但我不明白。

#!/usr/bin/perl

use warnings;
use strict;

my @test_data = qw(ant bee cat dodo elephant frog giraffe horse);

# Words wanted include these letters only. Hardcoded for demonstration purposes
my @wanted_letters = qw/a c d i n o t/;

# Subtract those letters from the alphabet to find the letters to eliminate.
# Interpolate array into a negated bracketed character class, positive grep
# against a list of the lowercase alphabet: fine, gets befghjklmpqrsuvwxyz.
my @unwanted_letters = grep(/[^@wanted_letters]/, ('a' .. 'z'));

# The desired result can be simulated by hardcoding the unwanted letters into a
# bracketed character class then doing a negative grep: matches ant, cat, and dodo.
my @works = grep(!/[befghjklmpqrsuvwxyz]/, @test_data);

# Doing something similar but moving the negation into the bracketed character
# class fails and matches everything.
my @fails1 = grep(/[^befghjklmpqrsuvwxyz]/, @test_data);

# Doing the same thing that produced the array of unwanted letters also fails.
my @fails2 = grep(/[^@unwanted_letters]/, @test_data);

print join ' ', @works; print "\n";
print join ' ', @fails1; print "\n";
print join ' ', @fails2; print "\n";

问题:

fails 都通过添加锚点 ^$ 以及量词 +

来固定

这两个都有效:

my @fails1 = grep(/^[^befghjklmpqrsuvwxyz]+$/, @test_data);
my @fails2 = grep(/^[^@unwanted_letters]+$/, @test_data);

请记住,/[^befghjklmpqrsuvwxyz]//[^@unwanted_letters]/ 仅匹配一个字符。添加 + 意味着尽可能多。添加^$表示字符串从头到尾的所有字符。

使用 /[@wanted_letters]/,如果只有一个想要的字符(即使字符串中有不需要的字符),您将 return 匹配 - 逻辑等同于 any. Compare to /^[@wanted_letters]+$/ where all the letters need to be in the set of @wanted_letters and is the equivalent of all.

Demo1 只有一个字符,所以 grep 失败。

Demo2 量词表示不止一个但没有锚点 - grep 失败

Demo3 锚点和量词 - 预期结果。

一旦你了解 character classes only match ONE character and anchors for the WHOLE string and quantifiers 将匹配扩展到锚点的所有内容,你可以直接用想要的字母进行 grep:

my @wanted = grep(/^[@wanted_letters]+$/, @test_data);

您正在匹配字符串中任何字符集之外的内容。但它仍然可以在字符串中其他地方的字符集中包含字符。例如,如果测试词是 elephant,否定字符 class 匹配 a 字符。

如果你想测试整个字符串,你需要量化它并锚定到末端。

grep(/^[^befghjklmpqrsuvwxyz]*$/, @test_data);

翻译成英文,就是“word contains no characters in the set”和“word contains a character not in the set”的区别。