使用 lambda 函数搜索特定文本，然后在数据框中修改该文本

Question

这里是初级开发人员。

我正在尝试为报告创建“在岸”和“离岸”标签。我正在尝试使用 lambda 函数在我的数据框中搜索特定文本，然后在满足条件时覆盖数据框中的单元格。

数据集：

期望的输出：

当前输出：

当前代码：

df['Office_Address'] = df.apply(lambda x: "onshore" if ((x['Employment_Type'] == 'Consultant') and (x['Office_Address'] == str.contains('United States of America', regex = True))) else x['Office_Address'] == "offshore", axis = 1)

我认为我的问题在于 str.contains() 的使用，其中 returns 是一个布尔值，因此是当前输出。我不确定如何让 if 语句仅在看到 "United States of America" 的部分文本时触发。我的数据集在美利坚合众国标签的位置上有所不同，但我知道它总是这样拼写的。

Answer 1

您看到 True 或 False 的原因是因为最后的陈述例如：

else x['Office_Address'] == "offshore"

是该行值是否等于 'offshore' 的布尔语句，其计算结果为布尔值。这可以更新为 return

"offshore"

所以完整的例子是：

df['Office_Address'] = df.apply(lambda x: "onshore" if ((x['Employment_Type'] == 'Consultant') and ('United States of America' in x['Office_Address'])) else "offshore", axis = 1)

要进一步改进您的代码，您可以利用 pandas 使用列级操作对大量数据进行操作的能力。它们通常比 pd.apply 这样的东西更有效率。（参见 https://realpython.com/fast-flexible-pandas/）

考虑到这一点，这个例子可以直接修改列：

df.loc[(df['Employment_Type'] == 'Consultant') & (df['Office_Address'].str.contains('United States of America', regex = True)], 'Office_Address'] = "onshore" 
df.loc[!(df['Office_Address'] == 'onshore')], 'Office_Address'] = "offshore"

上面的语句使用了.loc[rows, columns]。这可以被认为是设置值 'at the location of' 这些行和这些列。在这种情况下，要更新的行是 Employment_Type 等于 'Consultant' 且 Office_Address 包含 'United States of America' 的行，并将这些行的 Office_Address 值设置为'onshore'。完成后，它会通过消除将剩余行的值设置为 'offshore'。

使用 lambda 函数搜索特定文本，然后在数据框中修改该文本

Using a lambda function to search for specific text then modify that text in the dataframe

python

lambda

dataframe