如何在dataframe中保存value_counts并从原始Dataframe中提取相关数据

How to save the value_counts in dataframe and pull out the related data from original Dataframe

我想在一列中找到频繁重复的元素,并将结果保存为Dataframe,然后从原始Dataframe中提取这些元素的相关信息

df = pd.DataFrame({ 'A' : np.random.randint(1000, 1005, ( 10)),
                        'B' : pd.Categorical(['company0', 'company1', 'company1', 'company2', 'company5', 'company5', 'company0', 'company5', 'company2', 'company2']),
                        'C' : 'foo',
                        'D' : pd.Categorical(["test","train","train","cup","bib","bib","test",'bib',"cup","cup"]),
                         })


# # generate 'company' DF
company = pd.DataFrame(df.B.value_counts().reset_index())
company.columns = ['B', 'count']
print(brands)

# # merge 'df' & 'company_count'
merged = pd.merge(df, company, on='B')
print(merged)

上面的代码给了我

     A         B    C      D    count
0  1003  company0  foo   test      2
1  1002  company0  foo   test      2
2  1004  company1  foo  train      2
3  1004  company1  foo  train      2
4  1001  company2  foo    cup      3
5  1000  company2  foo    cup      3
6  1003  company2  foo    cup      3
7  1000  company5  foo    bib      3
8  1004  company5  foo    bib      3
9  1001  company5  foo    bib      3

但我想要的是

          B  count    D
0  company5      3    bib
1  company2      3    cup
2  company1      2    train
3  company0      2    test

我怎样才能得到我想要的结果? 谢谢

从外观上看,一个B有一个独特的D。如果是这样,你可以这样做:

(df.groupby(['B','D'], observed=True).size()
   .reset_index(name='count')
)

输出:

          B      D  count
0  company0   test      2
1  company1  train      2
2  company2    cup      3
3  company5    bib      3