查找特定列的平均值并保留具有特定平均值的所有行

Finding mean of specific column and keep all rows that have specific mean values

我有这个数据框。

from pandas import DataFrame
import pandas as pd

df = pd.DataFrame({'name': ['A','D','M','T','B','C','D','E','A','L'],
                   'id': [1,1,1,2,2,3,3,3,3,5],  
                   'rate': [3.5,4.5,2.0,5.0,4.0,1.5,2.0,2.0,1.0,5.0]})
>> df
  name  id  rate
0    A   1     3.5
1    D   1     4.5
2    M   1     2.0
3    T   2     5.0
4    B   2     4.0
5    C   3     1.5
6    D   3     2.0
7    E   3     2.0
8    A   3     1.0
9    L   5     5.0
df = df.groupby('id')['rate'].mean()

我想要的是: 1) 求每个 'id'.
的平均值 2) 给出均值 >= 3.
的 id 数(长度) 3) 返回数据框的所有行(其中任何 id 的平均值 >= 3.

Expected output:
Number of ids (length) where mean >= 3: 3

>> dataframe where (mean(id) >=3)

>>df
  name  id  rate
0    A   1     3.0
1    D   1     4.0
2    M   1     2.0
3    T   2     5.0
4    B   2     4.0
5    L   5     5.0

使用GroupBy.transform for means by all groups with same size like original DataFrame, so possible filter by boolean indexing:

df = df[df.groupby('id')['rate'].transform('mean') >=3]
print (df)
  name  id  rate
0    A   1   3.5
1    D   1   4.5
2    M   1   2.0
3    T   2   5.0
4    B   2   4.0
9    L   5   5.0

详情:

print (df.groupby('id')['rate'].transform('mean'))
0    3.333333
1    3.333333
2    3.333333
3    4.500000
4    4.500000
5    1.625000
6    1.625000
7    1.625000
8    1.625000
9    5.000000
Name: rate, dtype: float64

DataFrameGroupBy.filter 的替代解决方案:

df = df.groupby('id').filter(lambda x: x['rate'].mean() >=3)