Python 用于查找两个 \n\n 和 \n\n 之间所有内容的正则表达式
Python regex for finding everything inbetween two \n\n and \n\n
我有一个很大的文本字符串,有几个块看起来很像这个;
text = '\n\n(d)In the event of this happens a Fee
of \xc2\xa32,000 gross, on each such occasion.\n\n'
使用下面的代码我可以找到所有的钱实例:
import re
re.finall('\xa3(.*)', text)
但这只是 return 到逗号 In the event of this happens a Fee of \xc2\xa32,000 gross
而不是整个块,我希望 return 英镑 Unicode 所在的块 \xa3
被提及
试试这个:
import re
text = '\n\nblock1\xa3block1.\n\nblock2\x80block2\n\nblock3\xa3block3\n\n'
result= re.findall('.*\xa3.*', text) #capture only blocks containing pound symbol and discards block2 that contains euro
print(result)
我推荐这个正则表达式:
text = ('\n\nthis is not wanted\n\n'
'(d)In the event of this happens a Fee\n'
'of \xc2\xa32,000 gross, on each such occasion.\n\n'
'another wanted line with pound: \xc2\xa31,000\n\n'
'this is also not wanted\n\n')
re.findall(r'(?:.+\n)*.*\xa3(?:.+\n)*', text)
这将找到所有包含至少一个 \xa3
.
的非空行的多行块
正如@wiktor-stribiżew 在评论中指出的那样,这只会找到那些在井号后有另一个字符的块;这似乎是你想要的,所以没问题,但应该提一下。
我有一个很大的文本字符串,有几个块看起来很像这个;
text = '\n\n(d)In the event of this happens a Fee
of \xc2\xa32,000 gross, on each such occasion.\n\n'
使用下面的代码我可以找到所有的钱实例:
import re
re.finall('\xa3(.*)', text)
但这只是 return 到逗号 In the event of this happens a Fee of \xc2\xa32,000 gross
而不是整个块,我希望 return 英镑 Unicode 所在的块 \xa3
被提及
试试这个:
import re
text = '\n\nblock1\xa3block1.\n\nblock2\x80block2\n\nblock3\xa3block3\n\n'
result= re.findall('.*\xa3.*', text) #capture only blocks containing pound symbol and discards block2 that contains euro
print(result)
我推荐这个正则表达式:
text = ('\n\nthis is not wanted\n\n'
'(d)In the event of this happens a Fee\n'
'of \xc2\xa32,000 gross, on each such occasion.\n\n'
'another wanted line with pound: \xc2\xa31,000\n\n'
'this is also not wanted\n\n')
re.findall(r'(?:.+\n)*.*\xa3(?:.+\n)*', text)
这将找到所有包含至少一个 \xa3
.
正如@wiktor-stribiżew 在评论中指出的那样,这只会找到那些在井号后有另一个字符的块;这似乎是你想要的,所以没问题,但应该提一下。