使用 RegEx 查找数字组，仅替换为组中的最后一个成员

Question

我有一个格式如下的 csv 文件（仅显示相关行）：

Global equity - 45%/45.1%
Private Investments - 25%/21%
Hedge Funds - 17.5%/18.1%
Bonds & cash - 12.5%/15.3%

我写了一个正则表达式来查找每次出现的数字（即 45%/45.1%，等等），我试图这样写，它只保留斜杠后面的数字。这是我写的：

with open('sheet.csv','rU') as f:
    rdr = csv.DictReader(f,delimiter=',')
    row1 = next(rdr)
    assets = str(row1['Asset Allocation '])
    finnum = re.sub(r'(\/[0-9]+.)','#This is where I want to replace with just the numbers after the slash',assets)
    print(finnum)

期望输出：

Global equity - 45.1%
Private Investments - 21%
etc...

如果我不知道我想要的数字的索引，这甚至可能吗？

Answer 1

您可以试试这个 ('\d+%/') 正则表达式来删除无用的数据。

import re

string = 'Global equity - 45%/45.1%'
re.sub(r'\d+%/', '', string) # 'Global equity - 45.1%'

Answer 2

如果专门寻找该模式，您可以使用基于组的替换和连接：

replace = lambda s: s.group(1) + ' ' + s.group(3)
re.sub(r'(.*) (\d+%/)(\d+%)', replace, 'Hedge Funds - 17.5%/18.1%')

然后有一个简单的删除不需要的：

val = 'Hedge Funds - 17.5%/18.1%'
re.sub(r'\d+%/', '', val)

或者，如果您不想使用正则表达式：

val = 'Hedge Funds - 17.5%/18.1%'
replaced = val[0:val.find(' - ')] + ' - ' + val[val.find('%/') + 2:]

Answer 3

您还可以将第一个数字之前和 / 之后的内容分组：

import re

s = 'Hedge Funds - 17.5%/18.1%'
print re.sub('(.*-) .*/(.*)', '\g<1> \g<2>', s)

输出：

Hedge Funds - 18.1%

Answer 4

如果您不想替换并且需要在代码的其他部分使用这些值。你可以：

import re

cleanup = re.compile(r"(^.+?)-\s.+?\/(.+?)$",re.MULTILINE)
f = open(file_name, 'r')
text = f.read()
for match in cleanup.finditer(text):
    print match.group(1),match.group(2)

使用 RegEx 查找数字组，仅替换为组中的最后一个成员

Using a RegEx to find groups of numbers, replace with only the last member of the group

python

regex

python-2.x

python-2.7