preg_match 从字符串中提取数据

preg_match to extract data from string

我有一个字符串“CPC >= $0 (Yesterday)”,我想获取数据: CPC>=0Yesterday。然而,符号 >= 可以在几个符号之间变化,但始终是比较符号。

$str = "CPC >= [=11=] (Yesterday)";
preg_match('/(?<metric1>\w+) (?<sign>\w+) $(?<digit>\d+) \(((?<time>\w+))\)/', $str, $matches);
print_r($matches);

这给出了输出:

Array
(
)

编辑:

字符串也可以是:CPC (Link) > [=18=] (Today)符号前的括号。当你post回答的时候,你能不能也解释一下你的模式中使用的字符?

(从评论中粘贴...)

I'm trying to get CPC (Link), >, 0, Today in the array --- No brackets for the last item.

Yes, bracket for the first part and the comparison operators can be: > or < or <= or >=.

几个问题:

  • >、= 等不是单词字符(由 \w 匹配)。你需要使用 \S(任何非空白字符)代替。
  • 您需要转义 $ 符号(否则它会尝试匹配 字符串).
  • time 周围的 () 超出了您的需要

试试这个:

$regex = '/(?<metric1>\w+(\s\([^)]+\))?)\s+(?<sign>\S+)\s+$(?<digit>\d+)\s+\((?<time>[^)]+)\)/';
$str = "CPC >= [=10=] (Yesterday)";
preg_match($regex, $str, $matches);
print_r($matches);
$str = "CPC (Link) > [=10=] (Today)";
preg_match($regex, $str, $matches);
print_r($matches);

输出:

Array
(
    [0] => CPC >= [=11=] (Yesterday)
    [metric1] => CPC
    [1] => CPC
    [2] => 
    [sign] => >=
    [3] => >=
    [digit] => 0
    [4] => 0
    [time] => Yesterday
    [5] => Yesterday
)
Array
(
    [0] => CPC (Link) > [=11=] (Yesterday)
    [metric1] => CPC (Link)
    [1] => CPC (Link)
    [2] =>  (Link)
    [sign] => >
    [3] => >
    [digit] => 0
    [4] => 0
    [time] => Today
    [5] => Today
)

$regex的解释:

(?<metric1>\w+(\s\([^)]+\))?) - captures a word (\w+) followed by an optional set of characters within () into a group called metric
(?<sign>\S+) - captures a sequence of non-whitespace characters (\S+) into a group called sign
$(?<digit>\d+) - captures a sequence of digits (\d+) following a $ sign into a group called digit
\((?<time>[^)]+) - captures a set of characters within () into a group called time

这是适用于您的示例的解决方案:

$str = "CPC >= [=10=] (Yesterday)";
preg_match_all("/[^\s$)(]+/", $str, $matches);
print_r($matches[0]);
// Array ( [0] => CPC [1] => >= [2] => 0 [3] => Yesterday )

对于 metric1,您可以在字符 class 中列出要匹配的字符,并以空格结尾并作为一个组重复。

如果 sign 部分可以是 ><<=>= 你可以匹配那些使用字符 class和一个可选的 =

对于 digit 部分,您可以捕获捕获组中美元符号之后的数字,并且您必须转义美元符号,否则其含义将是断言的开头这条线。

对于 time 部分,您可以在捕获组中捕获括号内的所有内容。

(?<metric1>(?:[\w()]+\s)+)(?<sign>[><]=?) $(?<digit>\d+) \((?<time>[^)]+)\)

说明

  • (?<metric1> 命名捕获组 metric1
    • (?:[\w()]+\s)+ 在非捕获组中 (?= 重复字符 class 中的匹配项,后跟空格并重复该组一次或多次
  • ) 关闭群组
  • (?<sign> 命名捕获组 sign
    • [><]=? 匹配 <> 字符 class 后跟可选的 =
  • ) $ 关闭组并匹配空格和美元符号
  • (?<digit>
    • \d+匹配一位或多位数字
  • ) 关闭组并匹配空格
  • \((?<time> 按字面匹配 ( 并开始命名捕获组 time
  • )\) 关闭组并按字面匹配 )

Demo

我从不使用命名捕获组,因为它们使模式更难阅读并且使输出数组膨胀。如果要生成命名变量,可以使用list()Symmetric Array Destructuring

如果这是我的项目,我可能不会命名捕获组或变量,但如果它使您的代码更具可读性或可理解性,那是一个足够崇高的理由。

  • 请记住,输出数组中的第一个元素是全字符串匹配,您用不着它。

Pattern Demo

代码:(Demo)

$strings = [
    'CPC >= [=10=] (Yesterday)',
    'CPC (Link) > 0 (Today)'
];

foreach ($strings as $string) {
    list($metric, $sign, $digit, $time) = preg_match('~([\w ()]+) ([><]=?) $(\d+) \(([^)]+)\)~', $string, $out) ? array_slice($out, 1) : ['', '', '', ''];  // if fails, use empty strings

    echo "metric: $metric, sign: $sign, digit: $digit, time: $time\n";
    var_export($metric);  // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($sign);    // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($digit);   // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($time);    // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n----------\n";
}

输出:

metric: CPC, sign: >=, digit: 0, time: Yesterday
'CPC'
'>='
'0'
'Yesterday'
----------
metric: CPC (Link), sign: >, digit: 100, time: Today
'CPC (Link)'
'>'
'100'
'Today'
----------

模式分解:

~            #starting pattern delimiter
(            #start of Capture Group #1
  [\w ()]+   #match (as much as possible) 1 or more A-Z, a-z, 0-9, _, space, or parenthesis (in any order)
)            #end of Capture Group #1
 (           #match space then start of Capture Group #2
   [><]=?    #match greater than or less than symbol followed optionally by equals symbol
 )           #end of Capture Group #2
 $          #match space then a dollar symbol (backslash tells regex to treat the dollar sign literally)
(            #start of Capture Group #3
  \d+        #match one or more digits
)            #end of Capture Group #3
 \(          #match space then opening parenthesis (made literal by backslash)
(            #start of Capture Group #4
  [^)]+      #match one or more characters that are not a closing parenthesis
)            #end of Capture Group #4
\)           #match closing parenthesis literally
~            #end pattern delimiter