pandas:groupby('date_x')['outcome'].mean()

pandas:groupby('date_x')['outcome'].mean()

https://www.kaggle.com/anokas/time-travel-eda

这些代码到底是什么意思?groupby('date_x')['outcome'].mean(),我在sklearn文档中找不到这个。

date_x['Class probability'] = df_train.groupby('date_x')['outcome'].mean()
date_x['Frequency'] = df_train.groupby('date_x')['outcome'].size()
date_x.plot( secondary_y='Frequency',figsize=(22, 10))

谢谢!

我认为更好的做法是使用 DataFrameGroupBy.agg 聚合 size 聚合长度,每个组使用 mean 按列分组 date_x:

d = {'mean':'Class probability','size':'Frequency'}
df = df_train.groupby('date_x')['outcome'].agg(['mean','size']).rename(columns=d)

df.plot( secondary_y='Frequency',figsize=(22, 10))

有关详细信息,请查看 applying multiple functions at once

样本:

d = {'date_x':pd.to_datetime(['2015-01-01','2015-01-01','2015-01-01',
                              '2015-01-02','2015-01-02']),
     'outcome':[20,30,40,50,60]}
df_train = pd.DataFrame(d)
print (df_train)
      date_x  outcome
0 2015-01-01       20 ->1.group
1 2015-01-01       30 ->1.group
2 2015-01-01       40 ->1.group
3 2015-01-02       50 ->2.group
4 2015-01-02       60 ->2.group

d = {'mean':'Class probability','size':'Frequency'}
df = df_train.groupby('date_x')['outcome'].agg(['mean','size']).rename(columns=d)
print (df)
            Class probability  Frequency
date_x                                  
2015-01-01                 30          3
2015-01-02                 55          2