如何在正则表达式中划分一个换行符和两个换行符?
How I can divide a newline and two newline in regular expression?
我想分组正则表达式的输出:
- 换行符'\n'
- 两个换行符'\n\n'
如何分成两组才能使用其他正则表达式拆分方法?
查找单独的换行符或我管理的两个换行符。
例如:
Facebook and Google exploited a feature__(\n)__
intended for “enterprise developers” to__(\n)__
distribute apps that collect large amounts__(\n)__
of data on private users, TechCrunch first reported.__(\n\n)__
Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.__(\n)__
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?__(\n\n)__
Some text so on...
我试过这段代码:
def find_newlines(file):
with open(file, "r") as content:
text = content.read()
content = re.split("\n+", text)
return content
结果是:
['Apple' , 'Something', 'Enything']
我想要以下输出:
['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.' __,__ 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']
我要获取一组换行符
和 2 组两个换行符。
您似乎试图将您的文本分成两个(或更多)由双换行符分隔的块。因此,一种方法是首先拆分 \n\n
上的文本。这将导致 blocks
仍然包含单个换行符。然后每个块可以将任何剩余的换行符替换为空格。这一切都可以使用 Python 列表理解来完成,如下所示:
text = """Facebook and Google exploited a feature
intended for “enterprise developers” to
distribute apps that collect large amounts
of data on private users, TechCrunch first reported.
Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""
content = [block.replace('\n', ' ') for block in text.split('\n\n')]
print(content)
给你一个包含两个条目且没有换行符的列表:
['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.', 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']
正则表达式可用于块由两个或多个空行分隔的情况,如下所示:
import re
text = """Facebook and Google exploited a feature
intended for “enterprise developers” to
distribute apps that collect large amounts
of data on private users, TechCrunch first reported.
Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""
content = [block.replace('\n', ' ') for block in re.split('\n{2,}', text)]
print(content)
我想分组正则表达式的输出:
- 换行符'\n'
- 两个换行符'\n\n'
如何分成两组才能使用其他正则表达式拆分方法?
查找单独的换行符或我管理的两个换行符。 例如:
Facebook and Google exploited a feature__(\n)__
intended for “enterprise developers” to__(\n)__
distribute apps that collect large amounts__(\n)__
of data on private users, TechCrunch first reported.__(\n\n)__
Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.__(\n)__
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?__(\n\n)__
Some text so on...
我试过这段代码:
def find_newlines(file):
with open(file, "r") as content:
text = content.read()
content = re.split("\n+", text)
return content
结果是:
['Apple' , 'Something', 'Enything']
我想要以下输出:
['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.' __,__ 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']
我要获取一组换行符 和 2 组两个换行符。
您似乎试图将您的文本分成两个(或更多)由双换行符分隔的块。因此,一种方法是首先拆分 \n\n
上的文本。这将导致 blocks
仍然包含单个换行符。然后每个块可以将任何剩余的换行符替换为空格。这一切都可以使用 Python 列表理解来完成,如下所示:
text = """Facebook and Google exploited a feature
intended for “enterprise developers” to
distribute apps that collect large amounts
of data on private users, TechCrunch first reported.
Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""
content = [block.replace('\n', ' ') for block in text.split('\n\n')]
print(content)
给你一个包含两个条目且没有换行符的列表:
['Facebook and Google exploited a feature intended for “enterprise developers” to distribute apps that collect large amounts of data on private users, TechCrunch first reported.', 'Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power. Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?']
正则表达式可用于块由两个或多个空行分隔的情况,如下所示:
import re
text = """Facebook and Google exploited a feature
intended for “enterprise developers” to
distribute apps that collect large amounts
of data on private users, TechCrunch first reported.
Apple’s maneuver has been characterized by some as a chilling demonstration of the company’s power.
Verge editor-in-chief Nilay Patel suggested in a tweet that it was cause for concern: First, they came for our enterprise certificates, then… well, what, exactly?"""
content = [block.replace('\n', ' ') for block in re.split('\n{2,}', text)]
print(content)