Python - Pandas - groupby 和 "agg" - 当组包含 nan 时将聚合设置为 nan
Python - Pandas - groupby and "agg" - set aggregate to nan when group contains a nan
我有以下例子:
index_ = pd.date_range('2001-01-01', '2010-12-31', freq = 'MS')
df_ = pd.DataFrame(np.random.randn(len(index_), 4), columns=list('ABCD'), index = index_ )
df_.loc['2009-01-01','A'] = np.nan
df_.loc['2007-08-01','B'] = np.nan
df_.groupby(pd.TimeGrouper('A')).agg({'A': np.sum, 'B': np.mean})
我希望 'B' 列在 2007 年为 NaN
,列 'A' 在 2009 年为 NaN
。如何实现?我尝试了 np.sum
函数,因为在 numpy 数组中它 returns NaN
当数组包含一个 nan 值时。这可以翻译成我想在这里使用的 "agg" 命令吗?
中使用参数 skipna=False
np.random.seed(100)
index_ = pd.date_range('2001-01-01', '2010-12-31', freq = 'MS')
df_ = pd.DataFrame(np.random.randn(len(index_), 4), columns=list('ABCD'), index = index_ )
df_.loc['2009-01-01','A'] = np.nan
df_.loc['2007-08-01','B'] = np.nan
df = df_.groupby(pd.TimeGrouper('A')).agg({'A': lambda x: x.sum(skipna=False),
'B': lambda x: x.mean(skipna=False)})
print (df)
B A
2001-12-31 0.184784 0.593025
2002-12-31 -0.251913 -1.720891
2003-12-31 -0.085896 -3.060836
2004-12-31 -0.327153 6.561670
2005-12-31 0.214115 3.400988
2006-12-31 0.270536 2.972164
2007-12-31 NaN 4.175623
2008-12-31 0.429060 -2.917714
2009-12-31 0.222544 NaN
2010-12-31 -0.339483 2.021474
我有以下例子:
index_ = pd.date_range('2001-01-01', '2010-12-31', freq = 'MS')
df_ = pd.DataFrame(np.random.randn(len(index_), 4), columns=list('ABCD'), index = index_ )
df_.loc['2009-01-01','A'] = np.nan
df_.loc['2007-08-01','B'] = np.nan
df_.groupby(pd.TimeGrouper('A')).agg({'A': np.sum, 'B': np.mean})
我希望 'B' 列在 2007 年为 NaN
,列 'A' 在 2009 年为 NaN
。如何实现?我尝试了 np.sum
函数,因为在 numpy 数组中它 returns NaN
当数组包含一个 nan 值时。这可以翻译成我想在这里使用的 "agg" 命令吗?
skipna=False
np.random.seed(100)
index_ = pd.date_range('2001-01-01', '2010-12-31', freq = 'MS')
df_ = pd.DataFrame(np.random.randn(len(index_), 4), columns=list('ABCD'), index = index_ )
df_.loc['2009-01-01','A'] = np.nan
df_.loc['2007-08-01','B'] = np.nan
df = df_.groupby(pd.TimeGrouper('A')).agg({'A': lambda x: x.sum(skipna=False),
'B': lambda x: x.mean(skipna=False)})
print (df)
B A
2001-12-31 0.184784 0.593025
2002-12-31 -0.251913 -1.720891
2003-12-31 -0.085896 -3.060836
2004-12-31 -0.327153 6.561670
2005-12-31 0.214115 3.400988
2006-12-31 0.270536 2.972164
2007-12-31 NaN 4.175623
2008-12-31 0.429060 -2.917714
2009-12-31 0.222544 NaN
2010-12-31 -0.339483 2.021474