带有 运行 数字的子字符串文本

sub-string text with a running number

这应该非常简单和简短,但我想不出一个好的和简短的方法:
例如,我有一个字符串:

'How many roads must a man walk down Before you call him a man? How many seas must a white dove sail Before she sleeps in the sand? Yes, and how many times must the cannon balls fly Before they're forever banned?'

我想用 运行 数字对 "how" 这个词进行子串,所以我得到:

'[1] many roads must a man walk down Before you call him a man? [2] many seas must a white dove sail Before she sleeps in the sand? Yes, and [3] many times must the cannon balls fly Before they're forever banned?'

您可以使用 re.sub 和替换功能。该函数将查找该词在字典中出现的频率和 return 一个相应的数字。

counts = collections.defaultdict(int)
def subst_count(match):
    word = match.group().lower()
    counts[word] += 1
    return "[%d]" % counts[word]

示例:

>>> text = "How many ...? How many ...? Yes, and how many ...?"
>>> re.sub(r"\bhow\b", subst_count, text, flags=re.I)
'[1] many ...? [2] many ...? Yes, and [3] many ...?'

注意:这对要替换的每个单词使用 不同 计数(以防您使用匹配多个单词的正则表达式),但 不会 重置调用 re.sub.

之间的计数
Test = 'How many roads must a man walk down Before you call him a man? How many seas must a white dove sail Before she sleeps in the sand? Yes, and how many times must the cannon balls fly Before theyre forever banned?'

i = 0

while("How" in Test):
    new = "["+str(i)+"]"
    Test = Test.replace("How",new,i)
    i=i+1


print Test

输出

[1] many roads must a man walk down Before you call him a man? [2] many seas   must a white dove sail Before she sleeps in the sand? Yes, and how many times must the cannon balls fly Before theyre forever banned?

这是使用 re.sub 替换函数的另一种方法。但是这段代码没有使用全局对象来跟踪计数,而是使用了函数属性。

import re

def count_replace():
    def replace(m):
        replace.count += 1
        return '[%d]' % replace.count
    replace.count = 0
    return replace

src = '''How many roads must a man walk down Before you call him a man? How many seas must a white dove sail Before she sleeps in the sand? Yes, and how many times must the cannon balls fly Before they're forever banned?'''

pat = re.compile('how', re.I)

print(pat.sub(count_replace(), src))

输出

[1] many roads must a man walk down Before you call him a man? [2] many seas must a white dove sail Before she sleeps in the sand? Yes, and [3] many times must the cannon balls fly Before they're forever banned?

如果您需要 替换完整的单词而不是部分单词,那么您需要更智能的正则表达式,例如 r"\bhow\b".

您可以利用 itertools.count 和一个函数作为替换参数,例如:

import re
from itertools import count

text = '''How many roads must a man walk down Before you call him a man? How many seas must a white dove sail Before she sleeps in the sand? Yes, and how many times must the cannon balls fly Before they're forever banned?'''
result = re.sub(r'(?i)\bhow\b', lambda m, c=count(1): '[{}]'.format(next(c)), text)
# [1] many roads must a man walk down Before you call him a man? [2] many seas must a white dove sail Before she sleeps in the sand? Yes, and [3] many times must the cannon balls fly Before they're forever banned?

只是为了好玩,我想看看我是否可以使用递归来解决这个问题,这就是我得到的:

def count_replace(s, to_replace, leng=0, count=1, replaced=[]):
    if s.find(' ') == -1:
        replaced.append(s)
        return ' '.join(replaced)
    else:
        if s[0:s.find(' ')].lower() == to_replace.lower():
            replaced.append('[%d]' % count)
            count += 1
            leng = len(to_replace)
        else:
            replaced.append(s[0:s.find(' ')])
            leng = s.find(' ')
        return count_replace(s[leng + 1:], to_replace, leng, count, replaced)

不用说,我不会推荐它,因为它效率低得离谱,而且它也过于复杂,但我想我还是会分享它!