如何从给定的字符串中提取所有字符,包括换行符(\n)
How to extract all the characters including linefeeds(\n) from a given string
我有以下格式的字符串:请注意:\n 表示换行
\n\nThe following table provides the details of intangible assets\nacquired, by major class and weighted average useful life:\n\n \n\n(USS in millions) USEFUL LIFE\nCustomer relationships 15 years 5\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets 4\n\nThe fair value in the opening balance sheet of the 30%\nredeemable noncontrolling interest in Loders was estimated to\nbe 0 million.
我必须提取 \n\n \n\n 和 \n\n[ 之间的所有字符=13=]
预期输出:
(USS in millions) USEFUL LIFE\nCustomer relationships 15 years 5\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets 4
我写了一个逻辑如下:
re.findall(r'(\n\n\s\n\n)(.|\n)*(\n\n)', result)
但是上面的代码没有给我想要的结果。请有人帮忙。
您可以先匹配双换行符(或匹配可选的回车 return 和换行符),然后捕获组 1 中以换行符结尾但不以换行符开头的所有行。
使用re.findall,您将得到一个包含捕获组值的列表。想要的结果是第二项。
\r?\n\r?\n(.*(?:\r?\n(?!\r?\n).*)*)\r?\n\r?\n
import re
s="\n\nThe following table provides the details of intangible assets\nacquired, by major class and weighted average useful life:\n\n \n\n(USS in millions) USEFUL LIFE\nCustomer relationships 15 years 5\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets 4\n\nThe fair value in the opening balance sheet of the 30%\nredeemable noncontrolling interest in Loders was estimated to\nbe 0 million."
regex = r"\r?\n\r?\n(.*(?:\r?\n(?!\r?\n).*)*)\r?\n\r?\n"
print(re.findall(regex, s))
输出
[
'The following table provides the details of intangible assets\nacquired, by major class and weighted average useful life:',
'(USS in millions) USEFUL LIFE\nCustomer relationships 15 years 5\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets 4'
]
我有以下格式的字符串:请注意:\n 表示换行
\n\nThe following table provides the details of intangible assets\nacquired, by major class and weighted average useful life:\n\n \n\n(USS in millions) USEFUL LIFE\nCustomer relationships 15 years 5\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets 4\n\nThe fair value in the opening balance sheet of the 30%\nredeemable noncontrolling interest in Loders was estimated to\nbe 0 million.
我必须提取 \n\n \n\n 和 \n\n[ 之间的所有字符=13=]
预期输出:
(USS in millions) USEFUL LIFE\nCustomer relationships 15 years 5\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets 4
我写了一个逻辑如下:
re.findall(r'(\n\n\s\n\n)(.|\n)*(\n\n)', result)
但是上面的代码没有给我想要的结果。请有人帮忙。
您可以先匹配双换行符(或匹配可选的回车 return 和换行符),然后捕获组 1 中以换行符结尾但不以换行符开头的所有行。
使用re.findall,您将得到一个包含捕获组值的列表。想要的结果是第二项。
\r?\n\r?\n(.*(?:\r?\n(?!\r?\n).*)*)\r?\n\r?\n
import re
s="\n\nThe following table provides the details of intangible assets\nacquired, by major class and weighted average useful life:\n\n \n\n(USS in millions) USEFUL LIFE\nCustomer relationships 15 years 5\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets 4\n\nThe fair value in the opening balance sheet of the 30%\nredeemable noncontrolling interest in Loders was estimated to\nbe 0 million."
regex = r"\r?\n\r?\n(.*(?:\r?\n(?!\r?\n).*)*)\r?\n\r?\n"
print(re.findall(regex, s))
输出
[
'The following table provides the details of intangible assets\nacquired, by major class and weighted average useful life:',
'(USS in millions) USEFUL LIFE\nCustomer relationships 15 years 5\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets 4'
]