Pandas apply 不使用正则表达式函数

Question

感谢阅读并（希望）提供帮助！我被 pandas 申请难住了。我在一个正则表达式函数上使用它，该函数在普通字符串上工作得很好，但是当我在数据帧上使用它时，它只输出相同的单元格值。这是函数：

def match_pattern(df_cell): 
   if type(df_cell) == str:
        result = re.search(r'(?:[0-9]{1,4}\s)(.*)(?=\nName)', df_cell)
        if result:
            print('result.group(1)',result.group(1))
            return result.group(1)
        else:
            print('no result')
            return df_cell
    else:
        return df_cell

现在这对字符串很好用了。例如：

string = '3971 Small Arms Survey\nName'
string2 = 'nothing here'
match_pattern(string) # outputs 'Small Arms Survey' which is what i want
match_pattern(string2) # outputs 'nothing here'

但当我在带有 apply

的数据框上使用它时似乎不起作用

frame = pd.DataFrame(['3971 Small Arms Survey\nName'])
frame2 = frame.apply(lambda x: match_pattern(str(x)))
frame2 # outputs '3971 Small Arms Survey\nName'

我会尝试 iterrows 或 itertuples 等其他东西，但最终这个正则表达式函数应该用在大型数据帧的每个单元格上，任何比 apply 慢的东西都不可行。

match_pattern()函数中的打印语句仅用于调试。如果您想知道，print('result.group(1)',result.group(1)) 字符串会在以下两者中触发：'string' 上的应用程序和数据帧上的应用程序。但是打印输出不一样。在这两种情况下，打印输出都是函数 returns，在数据帧的情况下，它只是数据帧中开始的字符串，而对于字符串，打印输出是我想要过滤的字符串（即函数内正则表达式中的 group(1)）。

非常感谢 Wiktor Stribiżew，他的评论回答了我的问题！事实证明这是一个简单、愚蠢的错误。在数据框的列上使用应用将起作用：

frame = frame[0].apply(match_pattern) # outputs 'Small Arms Survey' for the cell, which is what i want

Answer 1

您可以运行 apply 在第 0 列：

import re
import pandas as pd

def match_pattern(df_cell): 
   if isinstance(df_cell, str):
        result = re.search(r'[0-9]{1,4}\s(.*)\nName', df_cell)
        if result:
            print('result.group(1)',result.group(1))
            return result.group(1)
        else:
            print('no result')
            return df_cell
   else:
        return df_cell

frame = pd.DataFrame(['3971 Small Arms Survey\nName'])
frame[0] = frame[0].apply(match_pattern)
# => frame
#                    0
# 0  Small Arms Survey

请注意，我将正则表达式缩减为 [0-9]{1,4}\s(.*)\nName，因为您只需要将文本捕获到组 1 中即可。

此外，if isinstance(df_cell, str): 恕我直言，检查 df_cell 的类型看起来更整洁。

Pandas apply 不使用正则表达式函数

Pandas apply does nothing with regex function

python

regex

apply

pandas