在正则表达式之前拆分字符串

Question

我正在尝试在字符串中的正则表达式之前插入制表符 (\t)。在“x 天前”之前，其中 x 是 0-999 之间的数字。

我的文字是这样的：

Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon

期望的输出：

Great product, fast shipping! \t 22 days ago anon
Fast shipping. Got an extra free! Thanks! \t 42 days ago anon

我对此还是个新手，正在苦苦挣扎。我四处寻找答案，找到了一些接近但 none 完全相同的答案。

这是我目前拥有的：

text = 'Great product, fast shipping! 22 days ago anon'
new_text = re.sub(r"\d+ days ago", "\t \d+", text)
print new_text

输出：

Great product, fast shipping!    \d+ anon

同样，我需要的是（注意\t）：

Great product, fast shipping!    22 days ago anon

Answer 1

您正在用正则表达式模式替换，您只需要一个 </code> 反向引用。</p> <p>为了在 <em>n 天前 </em> 之前插入一个制表符，您可以使用前瞻，并将捕获的数字替换为 <code>\t:

import re
p = re.compile(ur'(\d+)(?=\s+days\s+ago)')
test_str = u"Great product, fast shipping! 22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 days ago anon"
subst = u"\t\1"
print re.sub(p, subst, test_str)

demo的结果：

Great product, fast shipping!   22 days ago anon
Fast shipping. Got an extra free! Thanks!   42 days ago anon

还有一个sample program.

Answer 2

您可以在替换字符串中使用反向引用。在 \d+ days ago 两边加上括号，使其成为捕获的组，并在替换内容中使用 \1 来引用该组的文本：

>>> text = 'Great product, fast shipping! 22 days ago anon'
>>> new_text = re.sub(r"(\d+ days ago)", "\t\1", text)
>>> print new_text
Great product, fast shipping!    22 days ago anon

Answer 3

你可以使用

Tabindex = re.search(r"\d days ago",text).start()
text = text[0:Tabindex]+'\t'+text[Tabindex:len(text)]

Answer 4

您可以使用先行进行零宽度插入，并使用 ' ' 查找前导文字 space:

>>> import re
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 days ago anon'''
>>> repr(re.sub(r' (?=\d+)', ' \t', txt))
"'Great product, fast shipping! \t22 days ago anon\nFast shipping. Got an extra free! Thanks! \t42 days ago anon'"

请注意，所有符合 ' \d+' 的模式都变成了 ' \t\d+'，这就是我认为您想要的。

如果您想限制为 ' \d+ days ago''，只需将其添加到前瞻中即可：

>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 weeks ago anon'''
>>> repr(re.sub(r' (?=\d+ days ago)', ' \t', txt))
"'Great product, fast shipping! \t22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 weeks ago anon'"

在正则表达式之前拆分字符串

Split string before regex

regex

split

python-2.7