用于捕获重复单词之间的组的正则表达式
RegEx for capturing groups between repeated words
关键字是“*OR”或“*AND”。
假设我有以下字符串:
This is a t3xt with special characters like !#. *AND and this is
another text with special characters *AND this repeats *OR do not
repeat *OR have more strings *AND finish with this string.
我想要以下
group1 "This is a t3xt with special characters like !#."
group2 "*AND"
group3 "and this is another text with special characters"
group4 "*AND"
group5 "this repeats"
group6 "*OR"
group7 "do not repeat"
group8 "*OR"
group9 "have more strings"
group10 "*AND"
group11 "finish with this string."
我试过这样:
(.+?)(\*AND\*OR)
但它只获取第一个字符串,然后我需要不断重复代码以收集其他字符串,但问题是有些字符串只有一个 *AND,或者只有一个 *OR 或几十个,那是很随机的。下面的正则表达式也不起作用:
((.+?)(\*AND\*OR))+
例如:
This is a t3xt with special characters like !#. *AND and this is
another text with special characters
PHP 有一个 preg_split
函数来处理这类事情。 preg_split
允许您通过可以定义为正则表达式模式的定界符拆分字符串。此外,它还有一个参数,允许您在 matched/split 结果中包含匹配的定界符。
因此,不是编写正则表达式来匹配全文,而是正则表达式用于分隔符本身。
示例:
$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
$string = preg_split('~(\*(?:AND|OR))~',$string,0,PREG_SPLIT_DELIM_CAPTURE);
print_r($string);
输出:
Array
(
[0] => This is a t3xt with special characters like !#.
[1] => *AND
[2] => and this is another text with special characters
[3] => *AND
[4] => this repeats
[5] => *OR
[6] => do not repeat
[7] => *OR
[8] => have more strings
[9] => *AND
[10] => finish with this string.
)
但是如果你真的想坚持使用 preg_match
,你将需要使用 preg_match_all
,这类似于 preg_match
(你在问题中标记的内容),除了它确实 global/repeated 匹配。
示例:
$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
preg_match_all('~(?:(?:(?!\*(?:AND|OR)).)+)|(?:\*(?:AND|OR))~',$string,$matches);
print_r($matches);
输出:
Array
(
[0] => Array
(
[0] => This is a t3xt with special characters like !#.
[1] => *AND
[2] => and this is another text with special characters
[3] => *AND
[4] => this repeats
[5] => *OR
[6] => do not repeat
[7] => *OR
[8] => have more strings
[9] => *AND
[10] => finish with this string.
)
)
首先,请注意,与 preg_split
、preg_match_all
(和 preg_match
)不同,return 是一个 multi-dim 数组,而不是 single-dim。其次,从技术上讲,我使用的模式可以稍微简化,但它的代价是必须在 multi-dim 数组 returned 中引用多个数组(匹配文本的一个数组,和匹配定界符的另一个数组),然后您将不得不循环遍历并替代引用; IOW 将进行额外的清理以获得包含两个匹配集的最终单个数组,如上所述。
我只展示这个方法,因为你在问题中技术上要求它,但我建议使用 preg_split
,因为它消除了很多这种开销,以及为什么首先创建它(更好地解决这样的场景)。
关键字是“*OR”或“*AND”。
假设我有以下字符串:
This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.
我想要以下
group1 "This is a t3xt with special characters like !#."
group2 "*AND"
group3 "and this is another text with special characters"
group4 "*AND"
group5 "this repeats"
group6 "*OR"
group7 "do not repeat"
group8 "*OR"
group9 "have more strings"
group10 "*AND"
group11 "finish with this string."
我试过这样:
(.+?)(\*AND\*OR)
但它只获取第一个字符串,然后我需要不断重复代码以收集其他字符串,但问题是有些字符串只有一个 *AND,或者只有一个 *OR 或几十个,那是很随机的。下面的正则表达式也不起作用:
((.+?)(\*AND\*OR))+
例如:
This is a t3xt with special characters like !#. *AND and this is another text with special characters
PHP 有一个 preg_split
函数来处理这类事情。 preg_split
允许您通过可以定义为正则表达式模式的定界符拆分字符串。此外,它还有一个参数,允许您在 matched/split 结果中包含匹配的定界符。
因此,不是编写正则表达式来匹配全文,而是正则表达式用于分隔符本身。
示例:
$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
$string = preg_split('~(\*(?:AND|OR))~',$string,0,PREG_SPLIT_DELIM_CAPTURE);
print_r($string);
输出:
Array
(
[0] => This is a t3xt with special characters like !#.
[1] => *AND
[2] => and this is another text with special characters
[3] => *AND
[4] => this repeats
[5] => *OR
[6] => do not repeat
[7] => *OR
[8] => have more strings
[9] => *AND
[10] => finish with this string.
)
但是如果你真的想坚持使用 preg_match
,你将需要使用 preg_match_all
,这类似于 preg_match
(你在问题中标记的内容),除了它确实 global/repeated 匹配。
示例:
$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
preg_match_all('~(?:(?:(?!\*(?:AND|OR)).)+)|(?:\*(?:AND|OR))~',$string,$matches);
print_r($matches);
输出:
Array
(
[0] => Array
(
[0] => This is a t3xt with special characters like !#.
[1] => *AND
[2] => and this is another text with special characters
[3] => *AND
[4] => this repeats
[5] => *OR
[6] => do not repeat
[7] => *OR
[8] => have more strings
[9] => *AND
[10] => finish with this string.
)
)
首先,请注意,与 preg_split
、preg_match_all
(和 preg_match
)不同,return 是一个 multi-dim 数组,而不是 single-dim。其次,从技术上讲,我使用的模式可以稍微简化,但它的代价是必须在 multi-dim 数组 returned 中引用多个数组(匹配文本的一个数组,和匹配定界符的另一个数组),然后您将不得不循环遍历并替代引用; IOW 将进行额外的清理以获得包含两个匹配集的最终单个数组,如上所述。
我只展示这个方法,因为你在问题中技术上要求它,但我建议使用 preg_split
,因为它消除了很多这种开销,以及为什么首先创建它(更好地解决这样的场景)。