将字符串的部分提取到数组 php
Extracting the parts of string into array php
我有一个字符串需要分解并获取信息。
示例字符串:
"20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50"
首先,我通过 ,
分解字符串并得到
"20' Container 1"
"40' Open Container 1"
"40-45' Closed Container 3"
现在我也想分解已经分解的数组,以便得到以下格式的结果
array[
0 => [
0 => "20'"
1 => "Container"
2 => "1"
]
1 => [
0 => "40'"
1 => "Open Container"
2 => "1"
]
2 => [
0=> container roll
1=> 10
]
3=> [
0=> container lift
1 => 50
]
]
字符串可能会有所不同,但已确定格式相同,例如length type number
其中 length
是可选的,
我在做
$pattern = '/([\d-]*\')\s(.*)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
preg_match($pattern, $value, $matches); // Match length, type, number
$result[] = array_slice($matches, 1); // Slice with offset 1
$equipment->tokenized = $result;
}
然后我得到
Array
(
[0] => Array
(
[0] => 20'
[1] => container
[2] => 10
)
[1] => Array
(
[0] => 40'
[1] => open container
[2] => 10
)
[2] => Array
(
[0] => 40-45'
[1] => closed container
[2] => 20
)
[3] => Array
(
)
[4] => Array
(
)
)
根据给定的示例,您可以选择
<?php
$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";
$regex = "~
(?:(?P<group1>\d+(?:-\d+)?')\h*)?
(?P<group2>(?i:[a-z]+\h?)+)\h+
(?P<group3>\d+(?:'')?)
~x";
if (preg_match_all($regex, $string, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>
这产生:
Array
(
[0] => Array
(
[0] => 20' Container 1
[group1] => 20'
[1] => 20'
[group2] => Container
[2] => Container
[group3] => 1
[3] => 1
)
[1] => Array
(
[0] => 40' Open Container 1
[group1] => 40'
[1] => 40'
[group2] => Open Container
[2] => Open Container
[group3] => 1
[3] => 1
)
[2] => Array
(
[0] => 40-45' Closed Container 3
[group1] => 40-45'
[1] => 40-45'
[group2] => Closed Container
[2] => Closed Container
[group3] => 3
[3] => 3
)
[3] => Array
(
[0] => container roll 10
[group1] =>
[1] =>
[group2] => container roll
[2] => container roll
[group3] => 10
[3] => 10
)
[4] => Array
(
[0] => container lift 50
[group1] =>
[1] =>
[group2] => container lift
[2] => container lift
[group3] => 50
[3] => 50
)
)
核心正则表达式是
(?: # non-capturing group
(?P<group1>\d+(?:-\d+)?')\h* # group1 = digits, 1+ (-other digits), optionally
)? # make the whole group optional
(?P<group2>(?i:[a-z]+\h?)+)\h+ # group2 = [a-zA-Z]+ horizontal whitespaces, no digits
(?P<group3>\d+(?:'')?) # group3 = other digits + '', eventually
您可以使用 *
使第一个数字和 ' 可选。
$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
var_dump($matches);
这给出了这样的输出:
array(4) {
[0]=>
array(5) {
[0]=>
string(15) "20' Container 1"
[1]=>
string(20) "40' Open Container 1"
[2]=>
string(22) "45' Closed Container 3"
[3]=>
string(18) " container roll 10"
[4]=>
string(18) " container lift 50"
}
[1]=>
array(5) {
[0]=>
string(3) "20'"
[1]=>
string(3) "40'"
[2]=>
string(3) "45'"
[3]=>
string(0) ""
[4]=>
string(0) ""
}
[2]=>
array(5) {
[0]=>
string(10) "Container "
[1]=>
string(15) "Open Container "
[2]=>
string(17) "Closed Container "
[3]=>
string(15) "container roll "
[4]=>
string(15) "container lift "
}
[3]=>
array(5) {
[0]=>
string(1) "1"
[1]=>
string(1) "1"
[2]=>
string(1) "3"
[3]=>
string(2) "10"
[4]=>
string(2) "50"
}
}
要获得更接近您想要的数组,您可以使用数组列将匹配项按照您喜欢的方式进行分组。
$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
unset($matches[0]); // remove full match as it's not needed.
$res =[];
foreach($matches[1] as $key => $val){
$res[] = array_column($matches, $key);
}
var_dump($res);
假设只能缺少length
,您可以尝试使用我根据您现有模式修改的模式。加上 array_filter()
函数从每个 $matches
中删除空元素
$pattern = '/([\d-]*\')?\s?(\D+)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
preg_match($pattern, $value, $matches);
$result[] = array_slice(array_filter($matches), 1);
}
$equipment->tokenized = $result;
修改你的模式:
?
在第一个捕获组之后,如果不存在则可以跳过
/s?
之后也先跳过 space 如果第一组不存在
- 将
(.*)
更改为 (\D+)
以匹配任何非数字字符(假设 type
不包含数字)
注意:我将$equipment->tokenized = $result;
行移到循环外,只设置一次,而不是在循环内重复设置
我认为我最同意 Erwin 的回答,但尽管这不是验证任务,但我喜欢 Jan 的回答在定义范围“长度”子字符串方面做得更好 Erwin's answer will match ' 1
. 没有迹象表明输入字符串中存在制表符或换行符,因此文字 space 是合适的。将正则表达式模式用双引号引起来意味着不需要转义模式中的撇号。郑重声明,Andreas 的模式不正确,因为它无法正确匹配“长度”子字符串,并且在“类型”子字符串中包含不需要的白色space。
这是我用来解析提供的输入的内容:(Demo) (Pattern Demo)
$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";
preg_match_all(
"~(\d+(?:-\d+)?')? (\D+) (\d+)~",
$string,
$matches,
PREG_SET_ORDER
);
print_r($matches); // use var_export() to show that no spaces are captured
图案说明:
- 匹配一个或多个数字,然后可选地匹配一个连字符后跟一个或多个数字,然后匹配一个撇号。整个捕获的序列是可选的。 (
Length
)
- 匹配,但不捕获,一个 space。
- 捕获一个或多个非数字字符。 (
Type
)
- 匹配,但不捕获,一个 space。
- 捕获一位或多位数字。 (
Number
)
我有一个字符串需要分解并获取信息。
示例字符串:
"20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50"
首先,我通过 ,
分解字符串并得到
"20' Container 1"
"40' Open Container 1"
"40-45' Closed Container 3"
现在我也想分解已经分解的数组,以便得到以下格式的结果
array[
0 => [
0 => "20'"
1 => "Container"
2 => "1"
]
1 => [
0 => "40'"
1 => "Open Container"
2 => "1"
]
2 => [
0=> container roll
1=> 10
]
3=> [
0=> container lift
1 => 50
]
]
字符串可能会有所不同,但已确定格式相同,例如length type number
其中 length
是可选的,
我在做
$pattern = '/([\d-]*\')\s(.*)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
preg_match($pattern, $value, $matches); // Match length, type, number
$result[] = array_slice($matches, 1); // Slice with offset 1
$equipment->tokenized = $result;
}
然后我得到
Array
(
[0] => Array
(
[0] => 20'
[1] => container
[2] => 10
)
[1] => Array
(
[0] => 40'
[1] => open container
[2] => 10
)
[2] => Array
(
[0] => 40-45'
[1] => closed container
[2] => 20
)
[3] => Array
(
)
[4] => Array
(
)
)
根据给定的示例,您可以选择
<?php
$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";
$regex = "~
(?:(?P<group1>\d+(?:-\d+)?')\h*)?
(?P<group2>(?i:[a-z]+\h?)+)\h+
(?P<group3>\d+(?:'')?)
~x";
if (preg_match_all($regex, $string, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
?>
这产生:
Array
(
[0] => Array
(
[0] => 20' Container 1
[group1] => 20'
[1] => 20'
[group2] => Container
[2] => Container
[group3] => 1
[3] => 1
)
[1] => Array
(
[0] => 40' Open Container 1
[group1] => 40'
[1] => 40'
[group2] => Open Container
[2] => Open Container
[group3] => 1
[3] => 1
)
[2] => Array
(
[0] => 40-45' Closed Container 3
[group1] => 40-45'
[1] => 40-45'
[group2] => Closed Container
[2] => Closed Container
[group3] => 3
[3] => 3
)
[3] => Array
(
[0] => container roll 10
[group1] =>
[1] =>
[group2] => container roll
[2] => container roll
[group3] => 10
[3] => 10
)
[4] => Array
(
[0] => container lift 50
[group1] =>
[1] =>
[group2] => container lift
[2] => container lift
[group3] => 50
[3] => 50
)
)
核心正则表达式是
(?: # non-capturing group
(?P<group1>\d+(?:-\d+)?')\h* # group1 = digits, 1+ (-other digits), optionally
)? # make the whole group optional
(?P<group2>(?i:[a-z]+\h?)+)\h+ # group2 = [a-zA-Z]+ horizontal whitespaces, no digits
(?P<group3>\d+(?:'')?) # group3 = other digits + '', eventually
您可以使用 *
使第一个数字和 ' 可选。
$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
var_dump($matches);
这给出了这样的输出:
array(4) {
[0]=>
array(5) {
[0]=>
string(15) "20' Container 1"
[1]=>
string(20) "40' Open Container 1"
[2]=>
string(22) "45' Closed Container 3"
[3]=>
string(18) " container roll 10"
[4]=>
string(18) " container lift 50"
}
[1]=>
array(5) {
[0]=>
string(3) "20'"
[1]=>
string(3) "40'"
[2]=>
string(3) "45'"
[3]=>
string(0) ""
[4]=>
string(0) ""
}
[2]=>
array(5) {
[0]=>
string(10) "Container "
[1]=>
string(15) "Open Container "
[2]=>
string(17) "Closed Container "
[3]=>
string(15) "container roll "
[4]=>
string(15) "container lift "
}
[3]=>
array(5) {
[0]=>
string(1) "1"
[1]=>
string(1) "1"
[2]=>
string(1) "3"
[3]=>
string(2) "10"
[4]=>
string(2) "50"
}
}
要获得更接近您想要的数组,您可以使用数组列将匹配项按照您喜欢的方式进行分组。
$str = '20\' Container 1, 40\' Open Container 1, 40-45\' Closed Container 3, container roll 10, container lift 50';
preg_match_all('/(\d*\'*)\s([a-zA-Z ]+)(\d+)/', $str, $matches);
unset($matches[0]); // remove full match as it's not needed.
$res =[];
foreach($matches[1] as $key => $val){
$res[] = array_column($matches, $key);
}
var_dump($res);
假设只能缺少length
,您可以尝试使用我根据您现有模式修改的模式。加上 array_filter()
函数从每个 $matches
$pattern = '/([\d-]*\')?\s?(\D+)\s(\d+)/';
foreach (explode(', ', $equipment->chassis_types) as $value) {
preg_match($pattern, $value, $matches);
$result[] = array_slice(array_filter($matches), 1);
}
$equipment->tokenized = $result;
修改你的模式:
?
在第一个捕获组之后,如果不存在则可以跳过/s?
之后也先跳过 space 如果第一组不存在- 将
(.*)
更改为(\D+)
以匹配任何非数字字符(假设type
不包含数字)
注意:我将$equipment->tokenized = $result;
行移到循环外,只设置一次,而不是在循环内重复设置
我认为我最同意 Erwin 的回答,但尽管这不是验证任务,但我喜欢 Jan 的回答在定义范围“长度”子字符串方面做得更好 Erwin's answer will match ' 1
. 没有迹象表明输入字符串中存在制表符或换行符,因此文字 space 是合适的。将正则表达式模式用双引号引起来意味着不需要转义模式中的撇号。郑重声明,Andreas 的模式不正确,因为它无法正确匹配“长度”子字符串,并且在“类型”子字符串中包含不需要的白色space。
这是我用来解析提供的输入的内容:(Demo) (Pattern Demo)
$string = "20' Container 1, 40' Open Container 1, 40-45' Closed Container 3, container roll 10, container lift 50";
preg_match_all(
"~(\d+(?:-\d+)?')? (\D+) (\d+)~",
$string,
$matches,
PREG_SET_ORDER
);
print_r($matches); // use var_export() to show that no spaces are captured
图案说明:
- 匹配一个或多个数字,然后可选地匹配一个连字符后跟一个或多个数字,然后匹配一个撇号。整个捕获的序列是可选的。 (
Length
) - 匹配,但不捕获,一个 space。
- 捕获一个或多个非数字字符。 (
Type
) - 匹配,但不捕获,一个 space。
- 捕获一位或多位数字。 (
Number
)