PHP 转义字符后带有单词边界的正则表达式
PHP regex with word boundary after escaped character
我最近无意中发现了这个问题,但我不明白为什么会这样。
考虑以下示例:我有一个随机文本和一个包含一些编程语言的数组。在一个循环中,我将使用正则表达式和单词边界 \b 前后匹配每种语言作为整个单词,然后打印 URL.
$string = 'I don\'t know C e C++ so well, but I can code in PHP.';
$languages = [
'PHP' => '/php/',
'C++' => '/cpp/',
'C' => '/c/',
];
foreach ($languages as $name => $uri) {
$regex = '/\b' . preg_quote($name, '/') . '\b/';
if (preg_match($regex, $string)) {
echo "For {$name} information refer to http://foo.bar{$uri}" . PHP_EOL;
}
}
我希望得到以下输出:
For PHP information refer to http://foo.bar/php/
For C++ information refer to http://foo.bar/cpp/
For C information refer to http://foo.bar/c/
然而,我得到的输出是:
For PHP information refer to http://foo.bar/php/
For C information refer to http://foo.bar/c/
紧跟转义加号 (+) 之后的单词边界 (\b) 无法正常工作。
如果我将 \b 替换为 [^\w] 它会起作用,但我不能 100% 确定这种方法不会适得其反。
为什么会发生这种情况,如何才能获得我需要的结果?
解决此问题的推荐方法是使用环视来断言单词字符而不是边界,例如(?<!\w)c\+\+(?!\w)
:
$string = 'I don\'t know C e C++ so well, but I can code in PHP.';
$languages = [
'PHP' => '/php/',
'C++' => '/cpp/',
'C' => '/c/',
];
foreach ($languages as $name => $uri) {
$regex = '/(?<!\w)' . preg_quote($name, '/') . '(?!\w)/';
if (preg_match($regex, $string)) {
echo "For {$name} information refer to http://foo.bar{$uri}" . PHP_EOL;
}
}
输出:
For PHP information refer to http://foo.bar/php/
For C++ information refer to http://foo.bar/cpp/
For C information refer to http://foo.bar/c/
我最近无意中发现了这个问题,但我不明白为什么会这样。
考虑以下示例:我有一个随机文本和一个包含一些编程语言的数组。在一个循环中,我将使用正则表达式和单词边界 \b 前后匹配每种语言作为整个单词,然后打印 URL.
$string = 'I don\'t know C e C++ so well, but I can code in PHP.';
$languages = [
'PHP' => '/php/',
'C++' => '/cpp/',
'C' => '/c/',
];
foreach ($languages as $name => $uri) {
$regex = '/\b' . preg_quote($name, '/') . '\b/';
if (preg_match($regex, $string)) {
echo "For {$name} information refer to http://foo.bar{$uri}" . PHP_EOL;
}
}
我希望得到以下输出:
For PHP information refer to http://foo.bar/php/
For C++ information refer to http://foo.bar/cpp/
For C information refer to http://foo.bar/c/
然而,我得到的输出是:
For PHP information refer to http://foo.bar/php/
For C information refer to http://foo.bar/c/
紧跟转义加号 (+) 之后的单词边界 (\b) 无法正常工作。
如果我将 \b 替换为 [^\w] 它会起作用,但我不能 100% 确定这种方法不会适得其反。
为什么会发生这种情况,如何才能获得我需要的结果?
解决此问题的推荐方法是使用环视来断言单词字符而不是边界,例如(?<!\w)c\+\+(?!\w)
:
$string = 'I don\'t know C e C++ so well, but I can code in PHP.';
$languages = [
'PHP' => '/php/',
'C++' => '/cpp/',
'C' => '/c/',
];
foreach ($languages as $name => $uri) {
$regex = '/(?<!\w)' . preg_quote($name, '/') . '(?!\w)/';
if (preg_match($regex, $string)) {
echo "For {$name} information refer to http://foo.bar{$uri}" . PHP_EOL;
}
}
输出:
For PHP information refer to http://foo.bar/php/
For C++ information refer to http://foo.bar/cpp/
For C information refer to http://foo.bar/c/