preg_match 使用两个匹配项时似乎达到了极限
preg_match appears to hit a limit when using two matches
我 运行 遇到了一个奇怪的问题。看来我在尝试使用 php-5.3.3
使用两个匹配时达到了 preg_replace 的某种限制
// works fine
$pattern_1 = '?START(.*)STOP?';
$string = 'START' . str_repeat('x',9999999) . 'STOP' ;
preg_match($pattern_1, $string , $matchedArray ) ;
$pattern_2 = '?START-ONE(.*)STOP-ONE.*START-TWO(.*)STOP-TWO.*?';
// works fine
$string = 'START-ONE this is head stuff STOP-ONE START-TWO' . str_repeat('x', 49970) . 'STOP-TWO' ;
preg_match($pattern_2, $string , $matchedArray_2 ) ;
// didnt work
$string = 'START-ONE this is head stuff STOP-ONE START-TWO' . str_repeat('x', 49971) . 'STOP-TWO' ;
preg_match($pattern_2, $string , $matchedArray_3 ) ;
只有一个匹配的第一个选项使用了一个非常大的字符串并且没有问题。
第二个选项的字符串长度为 50,026,可以正常工作。最后一个选项的字符串长度为 50,027(多一个),匹配不再有效。由于错误发生时 49971 数字可能会有所不同,因此可以将其更改为更大的数字以模拟问题。
有什么想法或想法吗?也许这是 php 版本问题?也许一个可能的解决方法是只使用一个匹配而不是两个然后 运行 preg_match 它两次?
好吧,PHP 对正则表达式错误的讨论并不多,它只是 returns false
最后一种情况,根据 PHP docs.
我在 C# 中使用 PCRE(preg_match
使用的正则表达式引擎)重现了这个问题(但字符数更多),我得到的错误是 PCRE_ERROR_MATCHLIMIT
.
这意味着您达到了 PCRE 中设置的回溯限制。这只是一种防止引擎无限循环的安全措施,我认为您的 PHP 配置将其设置为较低的值。
要解决此问题,您可以为控制此限制的 pcre.backtrack_limit
PHP 选项设置更高的值:
ini_set("pcre.backtrack_limit", "10000000"); // Actually, this is PCRE's default
旁注:
- 您可能应该将
(.*)
替换为 (.*?)
以减少无用的回溯并确保正确性(否则正则表达式引擎将通过 STOP
字符串并且必须回溯才能到达它)
- 使用
?
作为模式分隔符是一个 糟糕的 想法,因为它会阻止您使用 ?
元字符并因此应用上述建议。真的,你应该永远不要使用正则表达式元字符作为模式分隔符。
如果您对更底层的细节感兴趣,这里是 PCRE 文档的相关部分(重点是我的):
The match_limit
field provides a means of preventing PCRE from using up a vast amount of resources when running patterns that are not going to match, but which have a very large number of possibilities in their search trees. The classic example is a pattern that uses nested unlimited repeats.
Internally, pcre_exec()
uses a function called match()
, which it calls repeatedly (sometimes recursively). The limit set by match_limit
is imposed on the number of times this function is called during a match, which has the effect of limiting the amount of backtracking that can take place. For patterns that are not anchored, the count restarts from zero for each position in the subject string.
When pcre_exec()
is called with a pattern that was successfully studied with a JIT option, the way that the matching is executed is entirely different. However, there is still the possibility of runaway matching that goes on for a very long time, and so the match_limit
value is also used in this case (but in a different way) to limit how long the matching can continue.
The default value for the limit can be set when PCRE is built; the default default is 10 million, which handles all but the most extreme cases. You can override the default by suppling pcre_exec()
with a pcre_extra
block in which match_limit
is set, and PCRE_EXTRA_MATCH_LIMIT
is set in the flags field. If the limit is exceeded, pcre_exec()
returns PCRE_ERROR_MATCHLIMIT
.
A value for the match limit may also be supplied by an item at the start of a pattern of the form
(*LIMIT_MATCH=d)
where d
is a decimal number. However, such a setting is ignored unless d is less than the limit set by the caller of pcre_exec()
or, if no such limit is set, less than the default.
关于PHP不多嘴它的错误,你可以使用T-Regx库,它总是抛出异常:
// didnt work
$pattern_2 = '/START-ONE(.*)STOP-ONE.*START-TWO(.*)STOP-TWO.*/';
$string = 'START-ONE this is head stuff STOP-ONE START-TWO' . str_repeat('x', 959971) . 'STOP-TWO';
try {
pattern($pattern_2)->match($string)->first();
}
catch ($e) {
$m = $e->getMessage();
$m // After invoking preg_match(), preg_last_error() returned PREG_BACKTRACK_LIMIT_ERROR.
}
我 运行 遇到了一个奇怪的问题。看来我在尝试使用 php-5.3.3
使用两个匹配时达到了 preg_replace 的某种限制// works fine
$pattern_1 = '?START(.*)STOP?';
$string = 'START' . str_repeat('x',9999999) . 'STOP' ;
preg_match($pattern_1, $string , $matchedArray ) ;
$pattern_2 = '?START-ONE(.*)STOP-ONE.*START-TWO(.*)STOP-TWO.*?';
// works fine
$string = 'START-ONE this is head stuff STOP-ONE START-TWO' . str_repeat('x', 49970) . 'STOP-TWO' ;
preg_match($pattern_2, $string , $matchedArray_2 ) ;
// didnt work
$string = 'START-ONE this is head stuff STOP-ONE START-TWO' . str_repeat('x', 49971) . 'STOP-TWO' ;
preg_match($pattern_2, $string , $matchedArray_3 ) ;
只有一个匹配的第一个选项使用了一个非常大的字符串并且没有问题。
第二个选项的字符串长度为 50,026,可以正常工作。最后一个选项的字符串长度为 50,027(多一个),匹配不再有效。由于错误发生时 49971 数字可能会有所不同,因此可以将其更改为更大的数字以模拟问题。
有什么想法或想法吗?也许这是 php 版本问题?也许一个可能的解决方法是只使用一个匹配而不是两个然后 运行 preg_match 它两次?
好吧,PHP 对正则表达式错误的讨论并不多,它只是 returns false
最后一种情况,根据 PHP docs.
我在 C# 中使用 PCRE(preg_match
使用的正则表达式引擎)重现了这个问题(但字符数更多),我得到的错误是 PCRE_ERROR_MATCHLIMIT
.
这意味着您达到了 PCRE 中设置的回溯限制。这只是一种防止引擎无限循环的安全措施,我认为您的 PHP 配置将其设置为较低的值。
要解决此问题,您可以为控制此限制的 pcre.backtrack_limit
PHP 选项设置更高的值:
ini_set("pcre.backtrack_limit", "10000000"); // Actually, this is PCRE's default
旁注:
- 您可能应该将
(.*)
替换为(.*?)
以减少无用的回溯并确保正确性(否则正则表达式引擎将通过STOP
字符串并且必须回溯才能到达它) - 使用
?
作为模式分隔符是一个 糟糕的 想法,因为它会阻止您使用?
元字符并因此应用上述建议。真的,你应该永远不要使用正则表达式元字符作为模式分隔符。
如果您对更底层的细节感兴趣,这里是 PCRE 文档的相关部分(重点是我的):
The
match_limit
field provides a means of preventing PCRE from using up a vast amount of resources when running patterns that are not going to match, but which have a very large number of possibilities in their search trees. The classic example is a pattern that uses nested unlimited repeats.Internally,
pcre_exec()
uses a function calledmatch()
, which it calls repeatedly (sometimes recursively). The limit set bymatch_limit
is imposed on the number of times this function is called during a match, which has the effect of limiting the amount of backtracking that can take place. For patterns that are not anchored, the count restarts from zero for each position in the subject string.When
pcre_exec()
is called with a pattern that was successfully studied with a JIT option, the way that the matching is executed is entirely different. However, there is still the possibility of runaway matching that goes on for a very long time, and so thematch_limit
value is also used in this case (but in a different way) to limit how long the matching can continue.The default value for the limit can be set when PCRE is built; the default default is 10 million, which handles all but the most extreme cases. You can override the default by suppling
pcre_exec()
with apcre_extra
block in whichmatch_limit
is set, andPCRE_EXTRA_MATCH_LIMIT
is set in the flags field. If the limit is exceeded,pcre_exec()
returnsPCRE_ERROR_MATCHLIMIT
.A value for the match limit may also be supplied by an item at the start of a pattern of the form
(*LIMIT_MATCH=d)
where
d
is a decimal number. However, such a setting is ignored unless d is less than the limit set by the caller ofpcre_exec()
or, if no such limit is set, less than the default.
关于PHP不多嘴它的错误,你可以使用T-Regx库,它总是抛出异常:
// didnt work
$pattern_2 = '/START-ONE(.*)STOP-ONE.*START-TWO(.*)STOP-TWO.*/';
$string = 'START-ONE this is head stuff STOP-ONE START-TWO' . str_repeat('x', 959971) . 'STOP-TWO';
try {
pattern($pattern_2)->match($string)->first();
}
catch ($e) {
$m = $e->getMessage();
$m // After invoking preg_match(), preg_last_error() returned PREG_BACKTRACK_LIMIT_ERROR.
}