无法将 lambda 函数应用于数据集

Question

数据集包含 Pclass 列，其值为 (1, 2, 3)，Age.The Age 列有一些空值。我想用不同 class 人群的中位年龄替换那些空值。第一个 class 的人的中位年龄是 37，第二个 class 是 29，第三个 class 是 24。

下面是我要执行的操作的代码：

def fill_age(x):
    if pd.isna(x['Age']) and x['Pclass'] == 1:
        return 37
    elif pd.isna(x['Age']) and x['Pclass'] == 2:
        return 29
    elif pd.isna(x['Age']) and x['pclass'] == 3:
        return 24
    else:
        return x['Age']


df['Age'] = df.apply(fill_age)

但这是我得到的错误：

KeyError                                  Traceback (most recent call last)
<ipython-input-126-7375a6b3c119> in <module>
----> 1 df['Age'] = df.apply(fill_age)

KeyError: 'Age'

请让我知道我做错了什么。提前谢谢你。

Answer 1

使用 DataFrame.apply 每 axis=1:

df['Age'] = df.apply(fill_age, axis=1)

对于矢量化（更快）替代使用 Series.fillna with mapping by Series.map 按字典：

df['Age'] = df['Age'].fillna(df['Pclass'].map({1:37,2:29,3:24}))

无法将 lambda 函数应用于数据集

Not able to apply lambda function to a dataset

python

lambda

if-statement

dataframe

pandas