PHP preg_split() - 不要在 ' ' 之间分隔空格
PHP preg_split() - don't split spaces between ' '
我有这个字符串:
$string = "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?";
我将此字符串拆分为 space 和一些运算符(=、<、>、!=、>=、<=、<>),使用此代码:
$split = preg_split('/\s+|(,|[<>!]?=|<>?|>)/', $string, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
现在拆分的结果是这个数组:
Array
(
[0] => My
[1] => name
[2] => is
[3] => Emma
[4] => and
[5] => i
[6] => have
[7] => a
[8] => dillemma
[9] => ,
[10] => what's
[11] => the
[12] => distance
[13] => between
[14] => 'New
[15] => York'
[16] => and
[17] => 'Athene'
[18] => ?
)
现在我遇到的唯一问题是我希望 '' 之间的白色 space 不被分割,而是在分割后删除 '',在上面的这个例子中你可以看到 'New York' 拆分为:
[14] => 'New
[15] => York'
我想要的结果是:
[14] => New York
还有 'Athene',我希望它是:
[16] => Athene
所以基本上上面的数组应该是这样的:
Array
(
[0] => My
[1] => name
[2] => is
[3] => Emma
[4] => and
[5] => i
[6] => have
[7] => a
[8] => dillemma
[9] => ,
[10] => what's
[11] => the
[12] => distance
[13] => between
[14] => New York
[15] => and
[16] => Athena
[17] => ?
)
是的,这两个城市之间的距离是 4,925 英里或 7925 公里 :D
谢谢! :D
正则表达式
(?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<>=!]+|(?:,|[<>!]?=|<>?|>)
您可以在此处查看匹配项:https://regex101.com/r/LkHnHt/3
PHP代码
$text = "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?";
preg_match_all('/(?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<>=!]+|(?:,|[<>!]?=|<>?|>)/', $text, $matches);
foreach (array_filter($matches[1]) as $k => $v)
$matches[0][$k] = $v;
结果
Array
(
[0] => My
[1] => name
[2] => is
[3] => Emma
[4] => and
[5] => i
[6] => have
[7] => a
[8] => dillemma
[9] => ,
[10] => what's
[11] => the
[12] => distance
[13] => between
[14] => New York pop
[15] => and
[16] => Athene
[17] => ?
)
Array
(
[0] => age
[1] => <
[2] => 21
[3] => ,
[4] => length
[5] => >
[6] => 10
[7] => ,
[8] => height
[9] => <>
[10] => 10
[11] => ,
[12] => width
[13] => !=
[14] => 100
[15] => ,
[16] => name
[17] => =
[18] => Emma Einarsson
[19] => or
[20] => it
[21] => can
[22] => be
[23] => words
[24] => time
[25] => >=
[26] => 10
[27] => ,
[28] => clouds
[29] => <=
[30] => 4
)
注意,所有捕获的数据都保存在数组$matches[0]
中
如果我理解问题要求(在阅读问题和许多评论之后),唯一棘手的一点是保留单引号子字符串。
您想隔离:
- 单引号包裹的子字符串可能包含空格。
- 可能包含撇号(单引号)的词
- 数字
- 五个特定运算符:
<
、>
!,
=,
?`
模式:~\B'\K(?:[^']+)|\b[a-z']+\b|\d+|[<>!=?]+~i
带有测试电池的代码(Demo)
$strings = [
"age<21,length>10,height<>10,width!=100,name='Emma Einarsson' or it can be words time>=10,clouds<=4",
"age < 21, length > 10, height <> 10, width != 100, name = 'Emma Einarsson' or it can be words time >= 10, clouds <= 4",
"My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?",
"'New York' and London at the start and end with Paris and 'Los Angeles'"
];
foreach ($strings as $string) {
var_export(preg_match_all("~\B'\K(?:[^']+)|\b[a-z']+\b|\d+|[<>!=?]+~i", $string, $out) ? $out[0] : 'fail');
echo "\n";
}
模式分解:
~ #start of pattern delimiter
\B'\K(?:[^']+) #match a single-quote not preceded by [a-zA-Z0-9_], then restart the fullstring match using (\K), then match one or more non-single quote characters
| #OR
\b[a-z']+\b #match one or more letters and apostrophes
| #OR
\d+ #match one or more digits
| #OR
[<>!=?]+ #match one or more of your listed operators/symbols
~ #end of pattern delimiter
i #pattern modifier - make whole pattern case-insensitive
根据您的示例输入字符串,从技术上讲,您可以从我的模式中删除两个 \b
(单词边界标记)以提高模式效率,但我将它们保留在最准确的位置。
我有这个字符串:
$string = "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?";
我将此字符串拆分为 space 和一些运算符(=、<、>、!=、>=、<=、<>),使用此代码:
$split = preg_split('/\s+|(,|[<>!]?=|<>?|>)/', $string, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
现在拆分的结果是这个数组:
Array
(
[0] => My
[1] => name
[2] => is
[3] => Emma
[4] => and
[5] => i
[6] => have
[7] => a
[8] => dillemma
[9] => ,
[10] => what's
[11] => the
[12] => distance
[13] => between
[14] => 'New
[15] => York'
[16] => and
[17] => 'Athene'
[18] => ?
)
现在我遇到的唯一问题是我希望 '' 之间的白色 space 不被分割,而是在分割后删除 '',在上面的这个例子中你可以看到 'New York' 拆分为:
[14] => 'New
[15] => York'
我想要的结果是:
[14] => New York
还有 'Athene',我希望它是:
[16] => Athene
所以基本上上面的数组应该是这样的:
Array
(
[0] => My
[1] => name
[2] => is
[3] => Emma
[4] => and
[5] => i
[6] => have
[7] => a
[8] => dillemma
[9] => ,
[10] => what's
[11] => the
[12] => distance
[13] => between
[14] => New York
[15] => and
[16] => Athena
[17] => ?
)
是的,这两个城市之间的距离是 4,925 英里或 7925 公里 :D
谢谢! :D
正则表达式
(?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<>=!]+|(?:,|[<>!]?=|<>?|>)
您可以在此处查看匹配项:https://regex101.com/r/LkHnHt/3
PHP代码
$text = "My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?";
preg_match_all('/(?:\'([^\']*[\'s]?)\'|\"([^\"]*)\")|[^\s,<>=!]+|(?:,|[<>!]?=|<>?|>)/', $text, $matches);
foreach (array_filter($matches[1]) as $k => $v)
$matches[0][$k] = $v;
结果
Array
(
[0] => My
[1] => name
[2] => is
[3] => Emma
[4] => and
[5] => i
[6] => have
[7] => a
[8] => dillemma
[9] => ,
[10] => what's
[11] => the
[12] => distance
[13] => between
[14] => New York pop
[15] => and
[16] => Athene
[17] => ?
)
Array
(
[0] => age
[1] => <
[2] => 21
[3] => ,
[4] => length
[5] => >
[6] => 10
[7] => ,
[8] => height
[9] => <>
[10] => 10
[11] => ,
[12] => width
[13] => !=
[14] => 100
[15] => ,
[16] => name
[17] => =
[18] => Emma Einarsson
[19] => or
[20] => it
[21] => can
[22] => be
[23] => words
[24] => time
[25] => >=
[26] => 10
[27] => ,
[28] => clouds
[29] => <=
[30] => 4
)
注意,所有捕获的数据都保存在数组$matches[0]
中如果我理解问题要求(在阅读问题和许多评论之后),唯一棘手的一点是保留单引号子字符串。
您想隔离:
- 单引号包裹的子字符串可能包含空格。
- 可能包含撇号(单引号)的词
- 数字
- 五个特定运算符:
<
、>
!,
=,
?`
模式:~\B'\K(?:[^']+)|\b[a-z']+\b|\d+|[<>!=?]+~i
带有测试电池的代码(Demo)
$strings = [
"age<21,length>10,height<>10,width!=100,name='Emma Einarsson' or it can be words time>=10,clouds<=4",
"age < 21, length > 10, height <> 10, width != 100, name = 'Emma Einarsson' or it can be words time >= 10, clouds <= 4",
"My name is Emma and i have a dillemma, what's the distance between 'New York' and 'Athene' ?",
"'New York' and London at the start and end with Paris and 'Los Angeles'"
];
foreach ($strings as $string) {
var_export(preg_match_all("~\B'\K(?:[^']+)|\b[a-z']+\b|\d+|[<>!=?]+~i", $string, $out) ? $out[0] : 'fail');
echo "\n";
}
模式分解:
~ #start of pattern delimiter
\B'\K(?:[^']+) #match a single-quote not preceded by [a-zA-Z0-9_], then restart the fullstring match using (\K), then match one or more non-single quote characters
| #OR
\b[a-z']+\b #match one or more letters and apostrophes
| #OR
\d+ #match one or more digits
| #OR
[<>!=?]+ #match one or more of your listed operators/symbols
~ #end of pattern delimiter
i #pattern modifier - make whole pattern case-insensitive
根据您的示例输入字符串,从技术上讲,您可以从我的模式中删除两个 \b
(单词边界标记)以提高模式效率,但我将它们保留在最准确的位置。