通过在字符串值中查找子字符串来更改列值

Question

我正在尝试使用 pandas apply() 更改单个列中的值。我的功能部分起作用，但我一直在研究如何修复另一半。

数据列：

County Name Riverside County San Diego County SanFrancisco County/city

我试图去掉“County”，所以只剩下县名了。我成功地使用函数摆脱了“县”，但我无法从旧金山删除“County/city”。

代码：

def modify_county(countyname):
  if "/city" in countyname:
    return countyname.replace(" County/city","")
  return countyname.replace(" County","")

lfd["CountyName"] = lfd["CountyName"].apply(modify_county)

输出：

CountyName
Riverside
San Diego
San Francisco County/city

函数中的条件语句有问题吗？

Answer 1

这是另一种方法。它适用于您提供的数据。

import pandas as pd

s = pd.Series(['Riverside County', 'San Diego County', 'SanFrancisco County/city'])

res = s.apply(lambda x: ' '.join([w for w in x.split() if not 'County' in w]))

print(res)

# 0       Riverside
# 1       San Diego
# 2    SanFrancisco
# dtype: object

Answer 2

@jpp 的回答是按照您的要求执行的字面意思。但在这种情况下，我会使用 pandas.Series.replace 和一些正则表达式一次性替换整个内容：

import pandas as pd

s = pd.Series(['Riverside County', 'San Diego County', 'SanFrancisco County/city'])

res = s.replace(' County(/city)?', '', regex=True)

Answer 3

@jpp，将建议应用于整个专栏。不确定这是否是最好的方法，但它确实有效。

lfd["CountyName"] = pd.Series(lfd["CountyName"])

lfd["CountyName"] = lfd["CountyName"].apply(lambda x: ' '.join([w for w in x.split() if not 'County' in w]))

通过在字符串值中查找子字符串来更改列值

Changing column value by finding substring in string values

python

substring

pandas

pandas-apply