用顺序名称替换字符串?
Replacing string with sequential name?
我想做的是展示比解释更容易。假设我有一个这样的字符串:
The ^APPLE is a ^FRUIT
使用正则表达式 re.sub(),我想得到这个:
The ^V1 is a ^V2
看看它们是如何递增的。但现在更难了:
The ^X is ^Y but ^X is not ^Z
应该翻译成这样:
The ^V1 is ^V2 but ^V1 is not ^V3
即如果它重复然后它保留替换即 ^X => ^V1 案例。
我听说替换可以是一个函数,但我没弄对。
https://www.hackerrank.com/challenges/re-sub-regex-substitution/problem
我们可以尝试逐字迭代输入字符串,然后对每次出现的 ^TERM
进行 re.sub
全局替换,使用计数器来跟踪我们有多少个不同的术语看过:
inp = "The ^X is ^Y but ^X is not ^Z"
seen = dict()
counter = 0
for term in inp.split():
if re.match(r'\^([^^]+)', term):
if term not in seen:
counter = counter + 1
seen[term] = 1
print(term)
for key, value in seen.iteritems():
print key, value
m = re.match(r'\^([^^]+)', term)
label = "V" + str(counter)
inp = re.sub(r'\^' + m.group(1), '^' + label, inp)
print(inp)
这会打印:
The ^V1 is ^V2 but ^V1 is not ^V3
IIUC,你不需要re
。字符串操作将完成这项工作:
from collections import defaultdict
def sequential(str_):
d = defaultdict(int)
tokens = str_.split()
for i in tokens:
if i.startswith('^') and i not in d:
d[i] = '^V%s' % str(len(d) + 1)
return ' '.join(d.get(i, i) for i in tokens)
输出:
sequential('The ^APPLE is a ^FRUIT')
# 'The ^V1 is a ^V2'
sequential('The ^X is ^Y but ^X is not ^Z')
# 'The ^V1 is ^V2 but ^V1 is not ^V3'
经过一些搜索后发现有一个使用 re
模块和 dict.setdefault
进行多次替换的解决方案,如果您的术语可以包含数字,请使用此模式 '\^\w[\w\d]*'
:
import re
string = 'The ^X is ^Y but ^X is not ^Z'
terms = {}
print(re.sub('\^\w+', lambda match: terms.setdefault(match.group(0), '^V{}'.format(len(terms)+1)), string))
输出:
The ^V1 is ^V2 but ^V1 is not ^V3
sub
检查替换参数的 type
如果它是 str
类型它直接用它替换匹配,如果它是 function
它调用那个方法match
作为参数并用 returned value
.
替换匹配项
您可以创建一个简单的对象来处理增量:
import re
class inc:
def __init__(self):
self.a, self.c = {}, 0
def __getitem__(self, _v):
if _v not in self.a:
self.c += 1
self.a[_v] = self.c
return self.a[_v]
n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^APPLE is a ^FRUIT')
输出:
'The ^V1 is a ^V2'
n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^X is ^Y but ^X is not ^Z')
输出:
'The ^V1 is ^V2 but ^V1 is not ^V3'
我想做的是展示比解释更容易。假设我有一个这样的字符串:
The ^APPLE is a ^FRUIT
使用正则表达式 re.sub(),我想得到这个:
The ^V1 is a ^V2
看看它们是如何递增的。但现在更难了:
The ^X is ^Y but ^X is not ^Z
应该翻译成这样:
The ^V1 is ^V2 but ^V1 is not ^V3
即如果它重复然后它保留替换即 ^X => ^V1 案例。
我听说替换可以是一个函数,但我没弄对。
https://www.hackerrank.com/challenges/re-sub-regex-substitution/problem
我们可以尝试逐字迭代输入字符串,然后对每次出现的 ^TERM
进行 re.sub
全局替换,使用计数器来跟踪我们有多少个不同的术语看过:
inp = "The ^X is ^Y but ^X is not ^Z"
seen = dict()
counter = 0
for term in inp.split():
if re.match(r'\^([^^]+)', term):
if term not in seen:
counter = counter + 1
seen[term] = 1
print(term)
for key, value in seen.iteritems():
print key, value
m = re.match(r'\^([^^]+)', term)
label = "V" + str(counter)
inp = re.sub(r'\^' + m.group(1), '^' + label, inp)
print(inp)
这会打印:
The ^V1 is ^V2 but ^V1 is not ^V3
IIUC,你不需要re
。字符串操作将完成这项工作:
from collections import defaultdict
def sequential(str_):
d = defaultdict(int)
tokens = str_.split()
for i in tokens:
if i.startswith('^') and i not in d:
d[i] = '^V%s' % str(len(d) + 1)
return ' '.join(d.get(i, i) for i in tokens)
输出:
sequential('The ^APPLE is a ^FRUIT')
# 'The ^V1 is a ^V2'
sequential('The ^X is ^Y but ^X is not ^Z')
# 'The ^V1 is ^V2 but ^V1 is not ^V3'
经过一些搜索后发现有一个使用 re
模块和 dict.setdefault
进行多次替换的解决方案,如果您的术语可以包含数字,请使用此模式 '\^\w[\w\d]*'
:
import re
string = 'The ^X is ^Y but ^X is not ^Z'
terms = {}
print(re.sub('\^\w+', lambda match: terms.setdefault(match.group(0), '^V{}'.format(len(terms)+1)), string))
输出:
The ^V1 is ^V2 but ^V1 is not ^V3
sub
检查替换参数的 type
如果它是 str
类型它直接用它替换匹配,如果它是 function
它调用那个方法match
作为参数并用 returned value
.
您可以创建一个简单的对象来处理增量:
import re
class inc:
def __init__(self):
self.a, self.c = {}, 0
def __getitem__(self, _v):
if _v not in self.a:
self.c += 1
self.a[_v] = self.c
return self.a[_v]
n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^APPLE is a ^FRUIT')
输出:
'The ^V1 is a ^V2'
n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^X is ^Y but ^X is not ^Z')
输出:
'The ^V1 is ^V2 but ^V1 is not ^V3'