提高 Perl Regex 中最后一次出现匹配的性能

Question

我需要根据可接受值数组找到最后一次出现的匹配项。下面是 Perl 中的源代码。答案是 Q，因为根据 A、Q、I 和 J 的可接受值，它是最后一次出现。

挑战在于如何更改我的代码以使正则表达式更快。它目前是一个瓶颈，因为我必须运行它数百万次。

my $input = "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z";
my $regex = qr/(A|Q|I|J)/;

my @matches = $input =~ m/\b$regex\b/g;

print $matches[$#matches];

我希望看到提高查询速度但仍能找到 Q 匹配的新代码。

Answer 1

使用\K在最后打印时丢弃先前匹配的字符。

my $input = "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z";
my $regex = qr/.*\K\b[AQIJ]\b/;
if ($input =~ m/$regex/) {
print $&."\n";
}

使用捕获组。

my $input = "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z";
my $regex = qr/.*\b([AQIJ])\b/;
if ($input =~ m/$regex/) {
print ."\n";
}

更新：

my $input = "Apple Orange Mango Apple";
my $regex = qr/.*\K\b(?:Apple|Range|Mango)\b/;
if ($input =~ m/$regex/) {
print $&."\n";
}

Answer 2

您只需在匹配模式前添加 .* 即可找到最后一个匹配项。

像这样

my $input = "APPLE B C D E F G H INDIGO JACKAL K L M N O P QUIVER R S T U V W X Y Z";
my $regex = qr/APPLE|QUIVER|INDIGO|JACKAL/;
my ($last) = $input =~ /.*\b($regex)\b/;
print $last, "\n";

输出

QUIVER

提高 Perl Regex 中最后一次出现匹配的性能

Improve Performance of Last Occurrence Match in Perl Regex

regex

perl

pattern-matching