preg_match 从字符串中提取数据

Question

我有一个字符串“CPC >= $0 (Yesterday)”，我想获取数据： CPC、>=、0、Yesterday。然而，符号 >= 可以在几个符号之间变化，但始终是比较符号。

$str = "CPC >= [=11=] (Yesterday)";
preg_match('/(?<metric1>\w+) (?<sign>\w+) $(?<digit>\d+) \(((?<time>\w+))\)/', $str, $matches);
print_r($matches);

这给出了输出：

Array
(
)

编辑：

字符串也可以是：CPC (Link) > [=18=] (Today)符号前的括号。当你post回答的时候，你能不能也解释一下你的模式中使用的字符？

（从评论中粘贴...）

I'm trying to get CPC (Link), >, 0, Today in the array --- No brackets for the last item.

Yes, bracket for the first part and the comparison operators can be: > or < or <= or >=.

Answer 1

几个问题：

>、= 等不是单词字符（由 \w 匹配）。你需要使用 \S（任何非空白字符）代替。
您需要转义 $ 符号（否则它会尝试匹配字符串).
您 time 周围的 () 超出了您的需要

试试这个：

$regex = '/(?<metric1>\w+(\s\([^)]+\))?)\s+(?<sign>\S+)\s+$(?<digit>\d+)\s+\((?<time>[^)]+)\)/';
$str = "CPC >= [=10=] (Yesterday)";
preg_match($regex, $str, $matches);
print_r($matches);
$str = "CPC (Link) > [=10=] (Today)";
preg_match($regex, $str, $matches);
print_r($matches);

输出：

Array
(
    [0] => CPC >= [=11=] (Yesterday)
    [metric1] => CPC
    [1] => CPC
    [2] => 
    [sign] => >=
    [3] => >=
    [digit] => 0
    [4] => 0
    [time] => Yesterday
    [5] => Yesterday
)
Array
(
    [0] => CPC (Link) > [=11=] (Yesterday)
    [metric1] => CPC (Link)
    [1] => CPC (Link)
    [2] =>  (Link)
    [sign] => >
    [3] => >
    [digit] => 0
    [4] => 0
    [time] => Today
    [5] => Today
)

$regex的解释：

(?<metric1>\w+(\s\([^)]+\))?) - captures a word (\w+) followed by an optional set of characters within () into a group called metric
(?<sign>\S+) - captures a sequence of non-whitespace characters (\S+) into a group called sign
$(?<digit>\d+) - captures a sequence of digits (\d+) following a $ sign into a group called digit
\((?<time>[^)]+) - captures a set of characters within () into a group called time

Answer 2

这是适用于您的示例的解决方案：

$str = "CPC >= [=10=] (Yesterday)";
preg_match_all("/[^\s$)(]+/", $str, $matches);
print_r($matches[0]);
// Array ( [0] => CPC [1] => >= [2] => 0 [3] => Yesterday )

Answer 3

对于 metric1，您可以在字符 class 中列出要匹配的字符，并以空格结尾并作为一个组重复。

如果 sign 部分可以是 > 或 < 或 <= 或 >= 你可以匹配那些使用字符 class和一个可选的 =

对于 digit 部分，您可以捕获捕获组中美元符号之后的数字，并且您必须转义美元符号，否则其含义将是断言的开头这条线。

对于 time 部分，您可以在捕获组中捕获括号内的所有内容。

(?<metric1>(?:[\w()]+\s)+)(?<sign>[><]=?) $(?<digit>\d+) $(?<time>[^)]+)$

说明

(?<metric1> 命名捕获组 metric1
- (?:[\w()]+\s)+ 在非捕获组中 (?= 重复字符 class 中的匹配项，后跟空格并重复该组一次或多次
) 关闭群组
(?<sign> 命名捕获组 sign
- [><]=? 匹配 < 或 > 字符 class 后跟可选的 =
) $ 关闭组并匹配空格和美元符号
(?<digit>
- \d+匹配一位或多位数字
) 关闭组并匹配空格
\((?<time> 按字面匹配 ( 并开始命名捕获组 time
- [^)]+ 使用取反 character class
)\) 关闭组并按字面匹配 )

Demo

Answer 4

我从不使用命名捕获组，因为它们使模式更难阅读并且使输出数组膨胀。如果要生成命名变量，可以使用list()或Symmetric Array Destructuring。

如果这是我的项目，我可能不会命名捕获组或变量，但如果它使您的代码更具可读性或可理解性，那是一个足够崇高的理由。

请记住，输出数组中的第一个元素是全字符串匹配，您用不着它。

Pattern Demo

代码：(Demo)

$strings = [
    'CPC >= [=10=] (Yesterday)',
    'CPC (Link) > 0 (Today)'
];

foreach ($strings as $string) {
    list($metric, $sign, $digit, $time) = preg_match('~([\w ()]+) ([><]=?) $(\d+) \(([^)]+)\)~', $string, $out) ? array_slice($out, 1) : ['', '', '', ''];  // if fails, use empty strings

    echo "metric: $metric, sign: $sign, digit: $digit, time: $time\n";
    var_export($metric);  // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($sign);    // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($digit);   // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n";
    var_export($time);    // notice no leading or trailing spaces / unwanted characters in the output
    echo "\n----------\n";
}

输出：

metric: CPC, sign: >=, digit: 0, time: Yesterday
'CPC'
'>='
'0'
'Yesterday'
----------
metric: CPC (Link), sign: >, digit: 100, time: Today
'CPC (Link)'
'>'
'100'
'Today'
----------

模式分解：

~            #starting pattern delimiter
(            #start of Capture Group #1
  [\w ()]+   #match (as much as possible) 1 or more A-Z, a-z, 0-9, _, space, or parenthesis (in any order)
)            #end of Capture Group #1
 (           #match space then start of Capture Group #2
   [><]=?    #match greater than or less than symbol followed optionally by equals symbol
 )           #end of Capture Group #2
 $          #match space then a dollar symbol (backslash tells regex to treat the dollar sign literally)
(            #start of Capture Group #3
  \d+        #match one or more digits
)            #end of Capture Group #3
 \(          #match space then opening parenthesis (made literal by backslash)
(            #start of Capture Group #4
  [^)]+      #match one or more characters that are not a closing parenthesis
)            #end of Capture Group #4
\)           #match closing parenthesis literally
~            #end pattern delimiter

preg_match 从字符串中提取数据

preg_match to extract data from string

php

regex

preg-match