pandas 将函数应用于系列，检查前 n 个字符是否与预定义的字符串值匹配，如果匹配，则需要更新现有值

Question

我正在尝试将函数应用于 pandas 系列，检查系列中值的前 3 个和前 2 个字符。

如果其中一个匹配，则前 3 个或 2 个字符（取决于匹配的字符）需要替换为“0”，其余字符保持不变。

原来的 dtype 是 'O' 类型，我已经尝试将其转换为 'string' 类型，但仍然无法正常工作。

示例数据如下所示：

012xxxxxxx
+27xxxxxxxx
011xxxxxxx
27xxxxxxxx
etc...

我正在评估的条件是如果前 3 个字符 == '+27' 替换 '+27' 与 '0' 或者如果前 2 个字符 == '27' 将 '27' 替换为 '0'

我有以下应用方法，但这些值没有更新。

def normalize_number(num):
       
    if num[:3] == '+27':
        # num.str.replace(num[:3], '0') ## First Method
        return '0' + num[4:] ## Second Method
    else:
        return num
        
    if num[:2] == '27':
        # num.str.replace(num[:2], '0') 
        return '0' + num[3:] 
    else:
        return num

df['number'].apply(normalize_number)

我在这里错过了什么？

Answer 1

看来你应该在这里使用正则表达式。字符串以 27 开头，前面有可选的 +，替换为 0:

df['number2'] = df['number'].str.replace('^\+?27', '0', regex=True)

输出：

        number     number2
0   012xxxxxxx  012xxxxxxx
1  +27xxxxxxxx   0xxxxxxxx
2   011xxxxxxx  011xxxxxxx
3   27xxxxxxxx   0xxxxxxxx

为什么你的方法失败了

您的方法失败了，因为您以 else 语句返回得太早。你应该用过：

def normalize_number(num):
    if num[:3] == '+27':
        return '0' + num[4:] ## Second Method
    elif num[:2] == '27':
        return '0' + num[3:] 
    else:
        return num

注意。使用上面的正则表达式方法，它会更有效

正则表达式

^      # match start of string
\+     # match literal +
?      # make previous match (the "+") optional
27     # match literal 27

regex demo

pandas 将函数应用于系列，检查前 n 个字符是否与预定义的字符串值匹配，如果匹配，则需要更新现有值

pandas apply function to series that checks if the first n characters matches a predefined string value, if so, the existing value needs to be updated

python

pandas

为什么你的方法失败了

正则表达式