在可能嵌套时获取两个定界符之间的匹配项
Get matches between two delimiters when potentially nested
我的案例的特定分隔符是左括号和右括号。当没有嵌套时,我可以得到它们之间的文本如下:
$input = 'sometext(moretext)andmoretext(somemoretext)andevenmoretext(andmore)';
preg_match_all('#\((.*?)\)#', $input, $match);
echo('<pre>'.print_r($match[1],1).'</pre>');
Array
(
[0] => moretext
[1] => somemoretext
[2] => andmore
)
然而,当我有嵌套字符时,我 运行 遇到一些障碍,并得到以下内容。
$input = 'sometext(moretext)andmoretext(somemore(with(bitof(littletext)text)more(andmore)text)text)andevenmoretext(andmore)';
preg_match_all('#\((.*?)\)#', $input, $match);
echo('<pre>'.print_r($match[1],1).'</pre>');
Array
(
[0] => moretext
[1] => somemore(with(bitof(littletext
[2] => andmore
[3] => andmore
)
如何 return 分隔符之间的整个字符串:
Array
(
[0] => moretext
[1] => somemore(with(bitof(littletext)text)more(andmore)text)text
[2] => andmore
)
PS。最终,我将使用递归 PHP 对任何也包含括号的顶级匹配执行相同的任务。
您可以使用此 recursive regex pattern 来匹配 (...)
:
preg_match_all('/\( ( (?: [^()]* | (?R) )* ) \)/x', $input, $m);
print_r($m[1]);
(?R)
递归整个模式。
输出:
Array
(
[0] => moretext
[1] => somemore(with(bitof(littletext)text)more(andmore)text)text
[2] => andmore
)
为了做到这一点,这里有一个非正则表达式的解决方案。
function delimeterSplit( $input )
{
$str = '';
$output = array();
$op = 0;
$cp = 0;
foreach( str_split( $input ) as $k => $v )
{
if( $v === '(' )
{
++$op;
}
if( $input[ $k ] === ')' )
{
++$cp;
}
if( ( ( $op === 1 && $v !== '(' ) || $op > 1 ) && $op !== $cp )
{
$str .= $v;
}
if( $op > 0 && $op === $cp )
{
$op = 0;
$cp = 0;
$output[] = $str;
$str = '';
}
}
return $output;
}
echo '<pre>'.print_r( delimeterSplit( 'sometext(moretext)andmoretext(somemoretext)andevenmoretext(andmore)' ), true ).'</pre>';
echo '<pre>'.print_r( delimeterSplit( 'sometext(moretext)andmoretext(somemore(with(bitof(littletext)text)more(andmore)text)text)andevenmoretext(andmore)' ), true ).'</pre>';
输出:
Array
(
[0] => moretext
[1] => somemoretext
[2] => andmore
)
Array
(
[0] => moretext
[1] => somemore(with(bitof(littletext)text)more(andmore)text)text
[2] => andmore
)
我的案例的特定分隔符是左括号和右括号。当没有嵌套时,我可以得到它们之间的文本如下:
$input = 'sometext(moretext)andmoretext(somemoretext)andevenmoretext(andmore)';
preg_match_all('#\((.*?)\)#', $input, $match);
echo('<pre>'.print_r($match[1],1).'</pre>');
Array
(
[0] => moretext
[1] => somemoretext
[2] => andmore
)
然而,当我有嵌套字符时,我 运行 遇到一些障碍,并得到以下内容。
$input = 'sometext(moretext)andmoretext(somemore(with(bitof(littletext)text)more(andmore)text)text)andevenmoretext(andmore)';
preg_match_all('#\((.*?)\)#', $input, $match);
echo('<pre>'.print_r($match[1],1).'</pre>');
Array
(
[0] => moretext
[1] => somemore(with(bitof(littletext
[2] => andmore
[3] => andmore
)
如何 return 分隔符之间的整个字符串:
Array
(
[0] => moretext
[1] => somemore(with(bitof(littletext)text)more(andmore)text)text
[2] => andmore
)
PS。最终,我将使用递归 PHP 对任何也包含括号的顶级匹配执行相同的任务。
您可以使用此 recursive regex pattern 来匹配 (...)
:
preg_match_all('/\( ( (?: [^()]* | (?R) )* ) \)/x', $input, $m);
print_r($m[1]);
(?R)
递归整个模式。
输出:
Array
(
[0] => moretext
[1] => somemore(with(bitof(littletext)text)more(andmore)text)text
[2] => andmore
)
为了做到这一点,这里有一个非正则表达式的解决方案。
function delimeterSplit( $input )
{
$str = '';
$output = array();
$op = 0;
$cp = 0;
foreach( str_split( $input ) as $k => $v )
{
if( $v === '(' )
{
++$op;
}
if( $input[ $k ] === ')' )
{
++$cp;
}
if( ( ( $op === 1 && $v !== '(' ) || $op > 1 ) && $op !== $cp )
{
$str .= $v;
}
if( $op > 0 && $op === $cp )
{
$op = 0;
$cp = 0;
$output[] = $str;
$str = '';
}
}
return $output;
}
echo '<pre>'.print_r( delimeterSplit( 'sometext(moretext)andmoretext(somemoretext)andevenmoretext(andmore)' ), true ).'</pre>';
echo '<pre>'.print_r( delimeterSplit( 'sometext(moretext)andmoretext(somemore(with(bitof(littletext)text)more(andmore)text)text)andevenmoretext(andmore)' ), true ).'</pre>';
输出:
Array
(
[0] => moretext
[1] => somemoretext
[2] => andmore
)
Array
(
[0] => moretext
[1] => somemore(with(bitof(littletext)text)more(andmore)text)text
[2] => andmore
)