preg_match 从字符串中提取数据
preg_match to extract data from string
我有一个字符串“CPC >= $0 (Yesterday)”,我想获取数据:
CPC
、>=
、0
、Yesterday
。然而,符号 >=
可以在几个符号之间变化,但始终是比较符号。
$str = "CPC >= [=11=] (Yesterday)";
preg_match('/(?<metric1>\w+) (?<sign>\w+) $(?<digit>\d+) \(((?<time>\w+))\)/', $str, $matches);
print_r($matches);
这给出了输出:
Array
(
)
编辑:
字符串也可以是:CPC (Link) > [=18=] (Today)
符号前的括号。当你post回答的时候,你能不能也解释一下你的模式中使用的字符?
(从评论中粘贴...)
I'm trying to get CPC (Link)
, >
, 0
, Today
in the array --- No brackets for the last item.
Yes, bracket for the first part and the comparison operators can be: >
or <
or <=
or >=
.
几个问题:
- >、= 等不是单词字符(由 \w 匹配)。你需要使用
\S(任何非空白字符)代替。
- 您需要转义 $ 符号(否则它会尝试匹配
字符串).
- 您
time
周围的 () 超出了您的需要
试试这个:
$regex = '/(?<metric1>\w+(\s\([^)]+\))?)\s+(?<sign>\S+)\s+$(?<digit>\d+)\s+\((?<time>[^)]+)\)/';
$str = "CPC >= [=10=] (Yesterday)";
preg_match($regex, $str, $matches);
print_r($matches);
$str = "CPC (Link) > [=10=] (Today)";
preg_match($regex, $str, $matches);
print_r($matches);
输出:
Array
(
[0] => CPC >= [=11=] (Yesterday)
[metric1] => CPC
[1] => CPC
[2] =>
[sign] => >=
[3] => >=
[digit] => 0
[4] => 0
[time] => Yesterday
[5] => Yesterday
)
Array
(
[0] => CPC (Link) > [=11=] (Yesterday)
[metric1] => CPC (Link)
[1] => CPC (Link)
[2] => (Link)
[sign] => >
[3] => >
[digit] => 0
[4] => 0
[time] => Today
[5] => Today
)
$regex
的解释:
(?<metric1>\w+(\s\([^)]+\))?) - captures a word (\w+) followed by an optional set of characters within () into a group called metric
(?<sign>\S+) - captures a sequence of non-whitespace characters (\S+) into a group called sign
$(?<digit>\d+) - captures a sequence of digits (\d+) following a $ sign into a group called digit
\((?<time>[^)]+) - captures a set of characters within () into a group called time
这是适用于您的示例的解决方案:
$str = "CPC >= [=10=] (Yesterday)";
preg_match_all("/[^\s$)(]+/", $str, $matches);
print_r($matches[0]);
// Array ( [0] => CPC [1] => >= [2] => 0 [3] => Yesterday )
对于 metric1
,您可以在字符 class 中列出要匹配的字符,并以空格结尾并作为一个组重复。
如果 sign
部分可以是 >
或 <
或 <=
或 >=
你可以匹配那些使用字符 class和一个可选的 =
对于 digit
部分,您可以捕获捕获组中美元符号之后的数字,并且您必须转义美元符号,否则其含义将是断言的开头这条线。
对于 time
部分,您可以在捕获组中捕获括号内的所有内容。
(?<metric1>(?:[\w()]+\s)+)(?<sign>[><]=?) $(?<digit>\d+) \((?<time>[^)]+)\)
说明
(?<metric1>
命名捕获组 metric1
(?:[\w()]+\s)+
在非捕获组中 (?=
重复字符 class 中的匹配项,后跟空格并重复该组一次或多次
)
关闭群组
(?<sign>
命名捕获组 sign
[><]=?
匹配 <
或 >
字符 class 后跟可选的 =
) $
关闭组并匹配空格和美元符号
(?<digit>
\d+
匹配一位或多位数字
)
关闭组并匹配空格
\((?<time>
按字面匹配 (
并开始命名捕获组 time
[^)]+
使用取反 character class 不匹配右括号
)\)
关闭组并按字面匹配 )
我从不使用命名捕获组,因为它们使模式更难阅读并且使输出数组膨胀。如果要生成命名变量,可以使用list()
或Symmetric Array Destructuring。
如果这是我的项目,我可能不会命名捕获组或变量,但如果它使您的代码更具可读性或可理解性,那是一个足够崇高的理由。
- 请记住,输出数组中的第一个元素是全字符串匹配,您用不着它。
代码:(Demo)
$strings = [
'CPC >= [=10=] (Yesterday)',
'CPC (Link) > 0 (Today)'
];
foreach ($strings as $string) {
list($metric, $sign, $digit, $time) = preg_match('~([\w ()]+) ([><]=?) $(\d+) \(([^)]+)\)~', $string, $out) ? array_slice($out, 1) : ['', '', '', '']; // if fails, use empty strings
echo "metric: $metric, sign: $sign, digit: $digit, time: $time\n";
var_export($metric); // notice no leading or trailing spaces / unwanted characters in the output
echo "\n";
var_export($sign); // notice no leading or trailing spaces / unwanted characters in the output
echo "\n";
var_export($digit); // notice no leading or trailing spaces / unwanted characters in the output
echo "\n";
var_export($time); // notice no leading or trailing spaces / unwanted characters in the output
echo "\n----------\n";
}
输出:
metric: CPC, sign: >=, digit: 0, time: Yesterday
'CPC'
'>='
'0'
'Yesterday'
----------
metric: CPC (Link), sign: >, digit: 100, time: Today
'CPC (Link)'
'>'
'100'
'Today'
----------
模式分解:
~ #starting pattern delimiter
( #start of Capture Group #1
[\w ()]+ #match (as much as possible) 1 or more A-Z, a-z, 0-9, _, space, or parenthesis (in any order)
) #end of Capture Group #1
( #match space then start of Capture Group #2
[><]=? #match greater than or less than symbol followed optionally by equals symbol
) #end of Capture Group #2
$ #match space then a dollar symbol (backslash tells regex to treat the dollar sign literally)
( #start of Capture Group #3
\d+ #match one or more digits
) #end of Capture Group #3
\( #match space then opening parenthesis (made literal by backslash)
( #start of Capture Group #4
[^)]+ #match one or more characters that are not a closing parenthesis
) #end of Capture Group #4
\) #match closing parenthesis literally
~ #end pattern delimiter
我有一个字符串“CPC >= $0 (Yesterday)”,我想获取数据:
CPC
、>=
、0
、Yesterday
。然而,符号 >=
可以在几个符号之间变化,但始终是比较符号。
$str = "CPC >= [=11=] (Yesterday)";
preg_match('/(?<metric1>\w+) (?<sign>\w+) $(?<digit>\d+) \(((?<time>\w+))\)/', $str, $matches);
print_r($matches);
这给出了输出:
Array
(
)
编辑:
字符串也可以是:CPC (Link) > [=18=] (Today)
符号前的括号。当你post回答的时候,你能不能也解释一下你的模式中使用的字符?
(从评论中粘贴...)
I'm trying to get
CPC (Link)
,>
,0
,Today
in the array --- No brackets for the last item.Yes, bracket for the first part and the comparison operators can be:
>
or<
or<=
or>=
.
几个问题:
- >、= 等不是单词字符(由 \w 匹配)。你需要使用 \S(任何非空白字符)代替。
- 您需要转义 $ 符号(否则它会尝试匹配 字符串).
- 您
time
周围的 () 超出了您的需要
试试这个:
$regex = '/(?<metric1>\w+(\s\([^)]+\))?)\s+(?<sign>\S+)\s+$(?<digit>\d+)\s+\((?<time>[^)]+)\)/';
$str = "CPC >= [=10=] (Yesterday)";
preg_match($regex, $str, $matches);
print_r($matches);
$str = "CPC (Link) > [=10=] (Today)";
preg_match($regex, $str, $matches);
print_r($matches);
输出:
Array
(
[0] => CPC >= [=11=] (Yesterday)
[metric1] => CPC
[1] => CPC
[2] =>
[sign] => >=
[3] => >=
[digit] => 0
[4] => 0
[time] => Yesterday
[5] => Yesterday
)
Array
(
[0] => CPC (Link) > [=11=] (Yesterday)
[metric1] => CPC (Link)
[1] => CPC (Link)
[2] => (Link)
[sign] => >
[3] => >
[digit] => 0
[4] => 0
[time] => Today
[5] => Today
)
$regex
的解释:
(?<metric1>\w+(\s\([^)]+\))?) - captures a word (\w+) followed by an optional set of characters within () into a group called metric
(?<sign>\S+) - captures a sequence of non-whitespace characters (\S+) into a group called sign
$(?<digit>\d+) - captures a sequence of digits (\d+) following a $ sign into a group called digit
\((?<time>[^)]+) - captures a set of characters within () into a group called time
这是适用于您的示例的解决方案:
$str = "CPC >= [=10=] (Yesterday)";
preg_match_all("/[^\s$)(]+/", $str, $matches);
print_r($matches[0]);
// Array ( [0] => CPC [1] => >= [2] => 0 [3] => Yesterday )
对于 metric1
,您可以在字符 class 中列出要匹配的字符,并以空格结尾并作为一个组重复。
如果 sign
部分可以是 >
或 <
或 <=
或 >=
你可以匹配那些使用字符 class和一个可选的 =
对于 digit
部分,您可以捕获捕获组中美元符号之后的数字,并且您必须转义美元符号,否则其含义将是断言的开头这条线。
对于 time
部分,您可以在捕获组中捕获括号内的所有内容。
(?<metric1>(?:[\w()]+\s)+)(?<sign>[><]=?) $(?<digit>\d+) \((?<time>[^)]+)\)
说明
(?<metric1>
命名捕获组metric1
(?:[\w()]+\s)+
在非捕获组中(?=
重复字符 class 中的匹配项,后跟空格并重复该组一次或多次
)
关闭群组(?<sign>
命名捕获组sign
[><]=?
匹配<
或>
字符 class 后跟可选的=
) $
关闭组并匹配空格和美元符号(?<digit>
\d+
匹配一位或多位数字
)
关闭组并匹配空格\((?<time>
按字面匹配(
并开始命名捕获组time
[^)]+
使用取反 character class 不匹配右括号
)\)
关闭组并按字面匹配)
我从不使用命名捕获组,因为它们使模式更难阅读并且使输出数组膨胀。如果要生成命名变量,可以使用list()
或Symmetric Array Destructuring。
如果这是我的项目,我可能不会命名捕获组或变量,但如果它使您的代码更具可读性或可理解性,那是一个足够崇高的理由。
- 请记住,输出数组中的第一个元素是全字符串匹配,您用不着它。
代码:(Demo)
$strings = [
'CPC >= [=10=] (Yesterday)',
'CPC (Link) > 0 (Today)'
];
foreach ($strings as $string) {
list($metric, $sign, $digit, $time) = preg_match('~([\w ()]+) ([><]=?) $(\d+) \(([^)]+)\)~', $string, $out) ? array_slice($out, 1) : ['', '', '', '']; // if fails, use empty strings
echo "metric: $metric, sign: $sign, digit: $digit, time: $time\n";
var_export($metric); // notice no leading or trailing spaces / unwanted characters in the output
echo "\n";
var_export($sign); // notice no leading or trailing spaces / unwanted characters in the output
echo "\n";
var_export($digit); // notice no leading or trailing spaces / unwanted characters in the output
echo "\n";
var_export($time); // notice no leading or trailing spaces / unwanted characters in the output
echo "\n----------\n";
}
输出:
metric: CPC, sign: >=, digit: 0, time: Yesterday
'CPC'
'>='
'0'
'Yesterday'
----------
metric: CPC (Link), sign: >, digit: 100, time: Today
'CPC (Link)'
'>'
'100'
'Today'
----------
模式分解:
~ #starting pattern delimiter
( #start of Capture Group #1
[\w ()]+ #match (as much as possible) 1 or more A-Z, a-z, 0-9, _, space, or parenthesis (in any order)
) #end of Capture Group #1
( #match space then start of Capture Group #2
[><]=? #match greater than or less than symbol followed optionally by equals symbol
) #end of Capture Group #2
$ #match space then a dollar symbol (backslash tells regex to treat the dollar sign literally)
( #start of Capture Group #3
\d+ #match one or more digits
) #end of Capture Group #3
\( #match space then opening parenthesis (made literal by backslash)
( #start of Capture Group #4
[^)]+ #match one or more characters that are not a closing parenthesis
) #end of Capture Group #4
\) #match closing parenthesis literally
~ #end pattern delimiter