通过保持分组在 pandas 数据框列中查找前 n 个元素

Question

我试图找到列 total_petitions 的前 5 个元素，但保留我所做的有序分组。

df = df[['fy', 'EmployerState', 'total_petitions']]
table = df.groupby(['fy','EmployerState']).mean()
table.nlargest(5, 'total_petitions')

示例输出：

        
fy  EmployerState   total_petitions
2020    WA           7039.333333
2016    MD           2647.400000
2017    MD           2313.142857
...     TX           2305.541667
2020    TX           2081.952381

期望的输出：


fy  EmployerState total_petitions   
2016    AL  3.875000
        AR  225.333333
        AZ  26.666667
        CA  326.056604
        CO  21.333333
... ... ...
2020    VA  36.714286
        WA  7039.333333
        WI  43.750000
        WV  8986086.08
        WY  1.000000

其中 total_petitions 的元素是 5 个按年均值最高的州

Answer 1

你要找的是一个支点table:

df = df.pivot_table(values='total_petitions', index=['fy','EmployerState'])
df = df.groupby(level='fy')['total_petitions'].nlargest(5).reset_index(level=0, drop=True).reset_index()

通过保持分组在 pandas 数据框列中查找前 n 个元素

Find top n elements in pandas dataframe column by keeping the grouping

python

grouping

dataframe

pandas