在正则表达式之前拆分字符串
Split string before regex
我正在尝试在字符串中的正则表达式之前插入制表符 (\t)。在“x 天前”之前,其中 x 是 0-999 之间的数字。
我的文字是这样的:
Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon
期望的输出:
Great product, fast shipping! \t 22 days ago anon
Fast shipping. Got an extra free! Thanks! \t 42 days ago anon
我对此还是个新手,正在苦苦挣扎。我四处寻找答案,找到了一些接近但 none 完全相同的答案。
这是我目前拥有的:
text = 'Great product, fast shipping! 22 days ago anon'
new_text = re.sub(r"\d+ days ago", "\t \d+", text)
print new_text
输出:
Great product, fast shipping! \d+ anon
同样,我需要的是(注意\t):
Great product, fast shipping! 22 days ago anon
您正在用正则表达式模式替换,您只需要一个 </code> 反向引用。</p>
<p>为了在 <em>n 天前 </em> 之前插入一个制表符,您可以使用前瞻,并将捕获的数字替换为 <code>\t
:
import re
p = re.compile(ur'(\d+)(?=\s+days\s+ago)')
test_str = u"Great product, fast shipping! 22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 days ago anon"
subst = u"\t\1"
print re.sub(p, subst, test_str)
demo的结果:
Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon
还有一个sample program.
您可以在替换字符串中使用反向引用。在 \d+ days ago
两边加上括号,使其成为捕获的组,并在替换内容中使用 \1
来引用该组的文本:
>>> text = 'Great product, fast shipping! 22 days ago anon'
>>> new_text = re.sub(r"(\d+ days ago)", "\t\1", text)
>>> print new_text
Great product, fast shipping! 22 days ago anon
你可以使用
Tabindex = re.search(r"\d days ago",text).start()
text = text[0:Tabindex]+'\t'+text[Tabindex:len(text)]
您可以使用先行进行零宽度插入,并使用 ' '
查找前导文字 space:
>>> import re
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 days ago anon'''
>>> repr(re.sub(r' (?=\d+)', ' \t', txt))
"'Great product, fast shipping! \t22 days ago anon\nFast shipping. Got an extra free! Thanks! \t42 days ago anon'"
请注意,所有符合 ' \d+'
的模式都变成了 ' \t\d+'
,这就是我认为您想要的。
如果您想限制为 ' \d+ days ago''
,只需将其添加到前瞻中即可:
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 weeks ago anon'''
>>> repr(re.sub(r' (?=\d+ days ago)', ' \t', txt))
"'Great product, fast shipping! \t22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 weeks ago anon'"
我正在尝试在字符串中的正则表达式之前插入制表符 (\t)。在“x 天前”之前,其中 x 是 0-999 之间的数字。
我的文字是这样的:
Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon
期望的输出:
Great product, fast shipping! \t 22 days ago anon
Fast shipping. Got an extra free! Thanks! \t 42 days ago anon
我对此还是个新手,正在苦苦挣扎。我四处寻找答案,找到了一些接近但 none 完全相同的答案。
这是我目前拥有的:
text = 'Great product, fast shipping! 22 days ago anon'
new_text = re.sub(r"\d+ days ago", "\t \d+", text)
print new_text
输出:
Great product, fast shipping! \d+ anon
同样,我需要的是(注意\t):
Great product, fast shipping! 22 days ago anon
您正在用正则表达式模式替换,您只需要一个 </code> 反向引用。</p>
<p>为了在 <em>n 天前 </em> 之前插入一个制表符,您可以使用前瞻,并将捕获的数字替换为 <code>\t
:
import re
p = re.compile(ur'(\d+)(?=\s+days\s+ago)')
test_str = u"Great product, fast shipping! 22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 days ago anon"
subst = u"\t\1"
print re.sub(p, subst, test_str)
demo的结果:
Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon
还有一个sample program.
您可以在替换字符串中使用反向引用。在 \d+ days ago
两边加上括号,使其成为捕获的组,并在替换内容中使用 \1
来引用该组的文本:
>>> text = 'Great product, fast shipping! 22 days ago anon'
>>> new_text = re.sub(r"(\d+ days ago)", "\t\1", text)
>>> print new_text
Great product, fast shipping! 22 days ago anon
你可以使用
Tabindex = re.search(r"\d days ago",text).start()
text = text[0:Tabindex]+'\t'+text[Tabindex:len(text)]
您可以使用先行进行零宽度插入,并使用 ' '
查找前导文字 space:
>>> import re
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 days ago anon'''
>>> repr(re.sub(r' (?=\d+)', ' \t', txt))
"'Great product, fast shipping! \t22 days ago anon\nFast shipping. Got an extra free! Thanks! \t42 days ago anon'"
请注意,所有符合 ' \d+'
的模式都变成了 ' \t\d+'
,这就是我认为您想要的。
如果您想限制为 ' \d+ days ago''
,只需将其添加到前瞻中即可:
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 weeks ago anon'''
>>> repr(re.sub(r' (?=\d+ days ago)', ' \t', txt))
"'Great product, fast shipping! \t22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 weeks ago anon'"