用 group by 的平均值填充 NaN
Fill NaN with mean value with group by
我的数据集看起来像这样
Month DayOfWeek Class A1 A2 ... A999
July Monday Bata 7 9 ... 5
July Tuesay Bata 3 1 ... 2
July Sunday Bata 4 5 ... 6
July Monday Adid 9 8 ... 5
July Sunday Adid 4 0 ... 4
Sept Monday Nike 7 5 ... 7
Sept Sunday Nike 8 3 ... 7
Sept Satday Adid 2 7 ... 7
Sept Monday Bata 8 9 ... 4
Oct Monday Nike 4 2 ... 5
Oct Sunday Bata 8 6 ... 3
July Monday Nike NaN NaN NaN
Sept Sunday Nike NaN NaN NaN
Oct Satday Nike NaN NaN NaN
Sept Monday Bata NaN NaN NaN
我想用之前记录的平均值填充NaNs
我知道我可以使用
df['A1'] = df['A1'].fillna((df['A1'].mean()))
但这是一个糟糕的方法,因为我有超过 1000 列,以后它们可能会增加
加上
我想根据 Month 和 DayOfWeek 求平均值
此记录
July Monday Nike NaN NaN NaN
因此,平均值将只是 月份 = 七月 & DayOfWeek = 星期一
的记录的平均值
我该怎么做?
给你:
df['A1'] = df.groupby(['Month','DayOfWeek'])['A1'].transform(lambda x: x.fillna(x.mean()))
上面仍然会给出一个空值,因为"Month = Oct & DayOfWeek = Monday"没有值。
在这种情况下,您可能需要编写第二个代码来填充该月的平均值或 DayOfWeek 的平均值。
下面的代码片段用具有空值的记录的月份平均值填充空值:
df['A1'] = df.groupby('Month')['A1'].transform(lambda x: x.fillna(x.mean()))
如果有帮助请点赞
我的数据集看起来像这样
Month DayOfWeek Class A1 A2 ... A999
July Monday Bata 7 9 ... 5
July Tuesay Bata 3 1 ... 2
July Sunday Bata 4 5 ... 6
July Monday Adid 9 8 ... 5
July Sunday Adid 4 0 ... 4
Sept Monday Nike 7 5 ... 7
Sept Sunday Nike 8 3 ... 7
Sept Satday Adid 2 7 ... 7
Sept Monday Bata 8 9 ... 4
Oct Monday Nike 4 2 ... 5
Oct Sunday Bata 8 6 ... 3
July Monday Nike NaN NaN NaN
Sept Sunday Nike NaN NaN NaN
Oct Satday Nike NaN NaN NaN
Sept Monday Bata NaN NaN NaN
我想用之前记录的平均值填充NaNs
我知道我可以使用
df['A1'] = df['A1'].fillna((df['A1'].mean()))
但这是一个糟糕的方法,因为我有超过 1000 列,以后它们可能会增加
加上
我想根据 Month 和 DayOfWeek 求平均值
此记录
July Monday Nike NaN NaN NaN
因此,平均值将只是 月份 = 七月 & DayOfWeek = 星期一
的记录的平均值我该怎么做?
给你:
df['A1'] = df.groupby(['Month','DayOfWeek'])['A1'].transform(lambda x: x.fillna(x.mean()))
上面仍然会给出一个空值,因为"Month = Oct & DayOfWeek = Monday"没有值。 在这种情况下,您可能需要编写第二个代码来填充该月的平均值或 DayOfWeek 的平均值。 下面的代码片段用具有空值的记录的月份平均值填充空值:
df['A1'] = df.groupby('Month')['A1'].transform(lambda x: x.fillna(x.mean()))
如果有帮助请点赞