找到 2 个或更多换行符

Question

我的字符串看起来像：

'I saw a little hermit crab\r\nHis coloring was oh so drab\r\n\r\nIt\u2019s hard to see the butterfly\r\nBecause he flies across the sky\r\n\r\nHear the honking of the goose\r\nI think he\u2019s angry at the moose\r\n\r\'

我需要在有两个或更多个的地方拆分它 newlines。

我正在使用 re 模块，当然。

在这个特定的字符串上 re.split(r'\r\n\r\n+', text) 有效，但它不会捕获 \r\n\r\n\r\n，对吗？

我试过 re.split(r'(\r\n){2,}', text)，它在每行和 re.split(r'\r\n{2,}', text) 处拆分，它创建了 len() 1 的列表。

对于没有连续出现超过 2 个 \r\n 的字符串，re.split(r'(\r\n){2,}', text) == re.split(r'\r\n\r\n', text) 不应该是 True 吗？

Answer 1

你想使用一个Non-capturing group instead of a capturing group when you execute the call to re.split(). In the documentation，明确说明使用捕获组保留分隔符模式：

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

re.split(r'(?:\r\n){2,}', text)

Answer 2

re.split(r'(\r\n){2,}', text) 不会在每一行拆分。它完全符合您的要求，except 它保留了 \r\n 的一次出现，因为您已将其包含在捕获组中。改为使用非捕获组：

(?:\r\n){2,}

在这里你可以看到有什么区别：

>>> re.split(r'(?:\r\n){2,}', 'foo\r\n\r\nbar')
['foo', 'bar']
>>> re.split(r'(\r\n){2,}', 'foo\r\n\r\nbar')
['foo', '\r\n', 'bar']

找到 2 个或更多换行符

Find 2 or more Newlines

regex

python-2.7