用于捕获重复单词之间的组的正则表达式

Question

关键字是“*OR”或“*AND”。

假设我有以下字符串：

This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.

我想要以下

group1 "This is a t3xt with special characters like !#."  
group2 "*AND"  
group3 "and this is another text with special characters"  
group4 "*AND"  
group5 "this repeats"  
group6 "*OR"  
group7 "do not repeat"  
group8 "*OR"  
group9 "have more strings"  
group10 "*AND"  
group11 "finish with this string."

我试过这样：

(.+?)(\*AND\*OR)

但它只获取第一个字符串，然后我需要不断重复代码以收集其他字符串，但问题是有些字符串只有一个 *AND，或者只有一个 *OR 或几十个，那是很随机的。下面的正则表达式也不起作用：

((.+?)(\*AND\*OR))+

例如：

This is a t3xt with special characters like !#. *AND and this is another text with special characters

Answer 1

PHP 有一个 preg_split 函数来处理这类事情。 preg_split 允许您通过可以定义为正则表达式模式的定界符拆分字符串。此外，它还有一个参数，允许您在 matched/split 结果中包含匹配的定界符。

因此，不是编写正则表达式来匹配全文，而是正则表达式用于分隔符本身。

示例：

$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
$string = preg_split('~(\*(?:AND|OR))~',$string,0,PREG_SPLIT_DELIM_CAPTURE);
print_r($string);

输出：

Array
(
    [0] => This is a t3xt with special characters like !#. 
    [1] => *AND
    [2] =>  and this is another text with special characters 
    [3] => *AND
    [4] =>  this repeats 
    [5] => *OR
    [6] =>  do not repeat 
    [7] => *OR
    [8] =>  have more strings 
    [9] => *AND
    [10] =>  finish with this string.
)

但是如果你真的想坚持使用 preg_match，你将需要使用 preg_match_all，这类似于 preg_match（你在问题中标记的内容），除了它确实 global/repeated 匹配。

示例：

$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
preg_match_all('~(?:(?:(?!\*(?:AND|OR)).)+)|(?:\*(?:AND|OR))~',$string,$matches);
print_r($matches);

输出：

Array
(
    [0] => Array
        (
            [0] => This is a t3xt with special characters like !#. 
            [1] => *AND
            [2] =>  and this is another text with special characters 
            [3] => *AND
            [4] =>  this repeats 
            [5] => *OR
            [6] =>  do not repeat 
            [7] => *OR
            [8] =>  have more strings 
            [9] => *AND
            [10] =>  finish with this string.
        )

)

首先，请注意，与 preg_split、preg_match_all（和 preg_match）不同，return 是一个 multi-dim 数组，而不是 single-dim。其次，从技术上讲，我使用的模式可以稍微简化，但它的代价是必须在 multi-dim 数组 returned 中引用多个数组（匹配文本的一个数组，和匹配定界符的另一个数组），然后您将不得不循环遍历并替代引用； IOW 将进行额外的清理以获得包含两个匹配集的最终单个数组，如上所述。

我只展示这个方法，因为你在问题中技术上要求它，但我建议使用 preg_split，因为它消除了很多这种开销，以及为什么首先创建它（更好地解决这样的场景）。

用于捕获重复单词之间的组的正则表达式

RegEx for capturing groups between repeated words

php

regex

pcre

preg-match