re.split(" ", string) 和 re.split("\s+", string) 的区别？

Question

我最近在学习正则表达式，遇到一个问题。所以问题的标题是我想要找出的。我想因为 \s 代表白色 space，所以 re.split(" ", string) 和 re.split("\s+", string) 会给出相同的值，如下所示：

>>> import re
>>> a = re.split(" ", "Why is this wrong")
>>> a
["Why", "is", "this", "wrong"]

>>> import re
>>> a = re.split("\s+", "Why is this wrong")
>>> a
["Why", "is", "this", "wrong"]

这两个给出了相同的答案，所以我认为它们是同一回事。然而，事实证明这些是不同的。在什么情况下会有所不同？我在这里错过了什么让我眼花缭乱？

Answer 1

这只是根据您的示例看起来相似。

在 ' ' 上拆分（单个 space）正是这样做的 - 它在单个 space 上拆分。拆分时连续 spaces 将导致空“匹配”。

'\s+' 上的拆分也将拆分这些字符的多次出现，它包括其他白色 spaces 然后是“纯 spaces”：

import re

a = re.split(" ", "Why    is this  \t \t  wrong")
b = re.split("\s+", "Why    is this  \t \t  wrong")

print(a)
print(b)

输出：

# re.split(" ",data)
['Why', '', '', '', 'is', 'this', '', '\t', '\t', '', 'wrong']

# re.split("\s+",data)
['Why', 'is', 'this', 'wrong']

文档：

\s
Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v]. (https://docs.python.org/3/howto/regex.html#matching-characters)

Answer 2

就你贴出来的代码而言，大意是两者没有太大区别（就其目标而言），都是输出这个。

["Why", "is", "this", "wrong"]

区别只是...我会说 WAY 关于如何 split 字符串。在这种情况下，第一个使用 str 对象中的 .split() 内置方法，第二个使用 re.[=31= 中的 .split() 函数]

现在这个 re.split(" ", "Why is this wrong") 只是根据这个字符拆分字符串 " " 你的第一个参数或参数

现在这个 re.split("\s+", "Why is this wrong") 根据这个正则表达式 \s+.

拆分你的字符串

请注意 " " 与 \s+ 不同。这个 \s+ 更像是一个含义，而 " " 基本上就是一个 str。您可以找到有关正则表达式 here.

的更多信息

\s+ -> Returns a match where the string contains a white space character

我还应该说，如果您想要 split 一个基于不仅仅是 string 的字符串，或者您想让它更像一个模式？那么regex适合你。

Answer 3

表示约space个字符。 '\s' 与任何白色 spaces 字符（\b, \t, \n, \a, \r 等）分开。 '+' 是否跟随白色spaces。例如“\n \r \t \v”。在我看来，如果你需要直接使用字符串操作来分离，你应该像标准方法一样使用 my_string.split() 。否则你应该正则表达式。因为正则表达式引擎是有成本的，开发人员应该能够预测到这一点。

re.split(" ", string) 和 re.split("\s+", string) 的区别？

Difference between re.split(" ", string) and re.split("\s+", string)?

python

split

python-re