Pandas groupby 多列根据条件取另一列的平均值

Pandas groupby multiple columns take average of another based on condition

我被困在这个问题上了,类似的帖子对我来说有点像个黑洞。我还在学习中..

我想取满足条件的组的平均值。我的数据如下所示:

user          date           Flag    Value  
0    ron  12/23/2016        'flag'    10     
1    ron  12/21/2016        'n/a'     25     
2    ron   12/23/2016       'flag'    10     
3    ron  12/21/2016        'n/a'     3      
4   andy   12/22/2016       'flag'    5      
5   andy   12/22/2016       'flag'    1      

我想按 user + Flag 分组并创建一个新列 'Avg',它只采用 'flag' 的平均值。所以数据看起来像这样:

user          date           Flag    Value  Avg
0    ron  12/23/2016        'flag'    10     10
1    ron  12/21/2016        'n/a'     25     10
2    ron   12/23/2016       'flag'    10     10
3    ron  12/21/2016        'n/a'     3      10
4   andy   12/22/2016       'flag'    5      3
5   andy   12/22/2016       'flag'    1      3

我有这样的东西,但尝试了很多不同的变体:

groups = sample.groupby(['user','Flag'])
flag = sample.groupby(['user','Flag'])['Value'].transform('mean')
sample.loc[:,'Avg'] = np.select([flag.eq('flag'), groups.transform('mean')])

感谢指导..

这是 groupbymap 的解决方案:

df['Avg'] = df['user'].map(df[df['Flag']=="'flag'"]            # use "flag" only if you don't have `'` in the data'
                             .groupby('user')['Value'].mean())

输出:

   user        date    Flag  Value  Avg
0   ron  12/23/2016  'flag'     10   10
1   ron  12/21/2016   'n/a'     25   10
2   ron  12/23/2016  'flag'     10   10
3   ron  12/21/2016   'n/a'      3   10
4  andy  12/22/2016  'flag'      5    3
5  andy  12/22/2016  'flag'      1    3