在正则表达式之前拆分字符串

Split string before regex

我正在尝试在字符串中的正则表达式之前插入制表符 (\t)。在“x 天前”之前,其中 x 是 0-999 之间的数字。

我的文字是这样的:

Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon

期望的输出:

Great product, fast shipping! \t 22 days ago anon
Fast shipping. Got an extra free! Thanks! \t 42 days ago anon

我对此还是个新手,正在苦苦挣扎。我四处寻找答案,找到了一些接近但 none 完全相同的答案。

这是我目前拥有的:

text = 'Great product, fast shipping! 22 days ago anon'
new_text = re.sub(r"\d+ days ago", "\t \d+", text)
print new_text

输出:

Great product, fast shipping!    \d+ anon

同样,我需要的是(注意\t):

Great product, fast shipping!    22 days ago anon

您正在用正则表达式模式替换,您只需要一个 </code> 反向引用。</p> <p>为了在 <em>n 天前 </em> 之前插入一个制表符,您可以使用前瞻,并将捕获的数字替换为 <code>\t:

import re
p = re.compile(ur'(\d+)(?=\s+days\s+ago)')
test_str = u"Great product, fast shipping! 22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 days ago anon"
subst = u"\t\1"
print re.sub(p, subst, test_str)

demo的结果:

Great product, fast shipping!   22 days ago anon
Fast shipping. Got an extra free! Thanks!   42 days ago anon

还有一个sample program.

您可以在替换字符串中使用反向引用。在 \d+ days ago 两边加上括号,使其成为捕获的组,并在替换内容中使用 \1 来引用该组的文本:

>>> text = 'Great product, fast shipping! 22 days ago anon'
>>> new_text = re.sub(r"(\d+ days ago)", "\t\1", text)
>>> print new_text
Great product, fast shipping!    22 days ago anon

你可以使用

Tabindex = re.search(r"\d days ago",text).start()
text = text[0:Tabindex]+'\t'+text[Tabindex:len(text)]

您可以使用先行进行零宽度插入,并使用 ' ' 查找前导文字 space:

>>> import re
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 days ago anon'''
>>> repr(re.sub(r' (?=\d+)', ' \t', txt))
"'Great product, fast shipping! \t22 days ago anon\nFast shipping. Got an extra free! Thanks! \t42 days ago anon'"

请注意,所有符合 ' \d+' 的模式都变成了 ' \t\d+',这就是我认为您想要的。

如果您想限制为 ' \d+ days ago'',只需将其添加到前瞻中即可:

>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 weeks ago anon'''
>>> repr(re.sub(r' (?=\d+ days ago)', ' \t', txt))
"'Great product, fast shipping! \t22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 weeks ago anon'"