PHP 7 preg_replace PREG_JIT_STACKLIMIT_ERROR 简单字符串
PHP 7 preg_replace PREG_JIT_STACKLIMIT_ERROR with simple string
我知道其他人已经提交了关于这个错误的问题,但是我看不出这个正则表达式或主题字符串可以更简单。
对我来说,这是一个错误,但在将其提交给 PHP 之前,我想我会确保并获得帮助,看看这是否可以更简单。
这是一个显示 2 个字符串的小测试脚本;一个是 1024 x,一个是 1023:
// 1024 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
// Outputs nothing (bug?)
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i][/i]', $str);
echo "\n\n";
// 1023 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
// Outputs the unchanged string as expected
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i][/i]', $str);
如您所见,只有稍长的字符串(大于 1024 个字符)才会出现错误。将由此处理的字符串可以是任意长度——它们可以是论坛帖子、新闻文章等。
正则表达式解释
只是尝试进行一些降价解析,将 _I am italic_
之类的字符串转换为我们在某些情况下从旧站点使用的旧版标记。 reasons/uses 并不重要。重要的是,我相信这应该工作得很好,事实上,除了 PHP 7.
只有代表一个独立的词或句子时才匹配这些下划线。如果它前面有任何基于 "word" 的字符,则它不应匹配第一个下划线,如果它后面有任何基于 "word" 的字符,则它不应匹配最后一个下划线。
环境:Centos 7,PHP:7.1.6
重要提示:
应避免 (.|\n)*?
或 (.|\r?\n)*?
模式,因为它们会导致过多的冗余回溯。要匹配任何字符,您通常可以使用带有 DOTALL 标志的 .
,或者在 JavaScript 中,您可以使用 [^]
或 [\s\S]
结构。有关详细信息,请参阅 How do I match any character across multiple lines in a regular expression?。
当前问题
(.|\n(?!\n))*?
模式非常低效,如果不在模式末尾(根本没有意义的地方)使用,会导致大量冗余回溯。它越靠左,性能越差。
因为它所做的只是匹配任何字符,除了一个换行符,然后是一个没有跟随另一个换行符的换行符,以一种惰性的方式,您可以将模式重写为 .*?(?:\R(?!\R).*?)*
:
'~\b_([^_\n\t ].*?(?:\R(?!\R).*?)*)_\b~'
参见regex demo。
注:
(?<=[^\w]|^)
= \b
因为lookbehind 后面有一个_
(一个字char)
(?=[^\w]|$)
= \b
因为在lookahead 之前有一个_
.*?(?:\R(?!\R).*?)*
- 匹配:
.*?
- 除换行字符外的任何 0+ 个字符,尽可能少,然后
(?:\R(?!\R).*?)*
- 零个或多个序列:
\R(?!\R)
- 一个换行符序列后面没有另一个换行符序列(\R
= \n
、\r\n
或 \r
)
.*?
- 除换行字符外的任何 0+ 个字符,尽可能少
我知道其他人已经提交了关于这个错误的问题,但是我看不出这个正则表达式或主题字符串可以更简单。
对我来说,这是一个错误,但在将其提交给 PHP 之前,我想我会确保并获得帮助,看看这是否可以更简单。
这是一个显示 2 个字符串的小测试脚本;一个是 1024 x,一个是 1023:
// 1024 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
// Outputs nothing (bug?)
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i][/i]', $str);
echo "\n\n";
// 1023 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
// Outputs the unchanged string as expected
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i][/i]', $str);
如您所见,只有稍长的字符串(大于 1024 个字符)才会出现错误。将由此处理的字符串可以是任意长度——它们可以是论坛帖子、新闻文章等。
正则表达式解释
只是尝试进行一些降价解析,将 _I am italic_
之类的字符串转换为我们在某些情况下从旧站点使用的旧版标记。 reasons/uses 并不重要。重要的是,我相信这应该工作得很好,事实上,除了 PHP 7.
只有代表一个独立的词或句子时才匹配这些下划线。如果它前面有任何基于 "word" 的字符,则它不应匹配第一个下划线,如果它后面有任何基于 "word" 的字符,则它不应匹配最后一个下划线。
环境:Centos 7,PHP:7.1.6
重要提示:
应避免 (.|\n)*?
或 (.|\r?\n)*?
模式,因为它们会导致过多的冗余回溯。要匹配任何字符,您通常可以使用带有 DOTALL 标志的 .
,或者在 JavaScript 中,您可以使用 [^]
或 [\s\S]
结构。有关详细信息,请参阅 How do I match any character across multiple lines in a regular expression?。
当前问题
(.|\n(?!\n))*?
模式非常低效,如果不在模式末尾(根本没有意义的地方)使用,会导致大量冗余回溯。它越靠左,性能越差。
因为它所做的只是匹配任何字符,除了一个换行符,然后是一个没有跟随另一个换行符的换行符,以一种惰性的方式,您可以将模式重写为 .*?(?:\R(?!\R).*?)*
:
'~\b_([^_\n\t ].*?(?:\R(?!\R).*?)*)_\b~'
参见regex demo。
注:
(?<=[^\w]|^)
=\b
因为lookbehind 后面有一个(?=[^\w]|$)
=\b
因为在lookahead 之前有一个.*?(?:\R(?!\R).*?)*
- 匹配:.*?
- 除换行字符外的任何 0+ 个字符,尽可能少,然后(?:\R(?!\R).*?)*
- 零个或多个序列:\R(?!\R)
- 一个换行符序列后面没有另一个换行符序列(\R
=\n
、\r\n
或\r
).*?
- 除换行字符外的任何 0+ 个字符,尽可能少
_
(一个字char)
_