Pandas

Question

我有一个包含列的数据框：语言和单词

df:
      Parts of speech  word
    0 Noun             cat
    1 Noun             water
    2 Noun             cat
    3 verb             draw
    4 verb             draw
    5 adj              slow

我想按词性（我所期望的）对排名靠前的词进行分组：

Parts of speech     top 
Noun             {'cat':2,'water':1}
verb             {'draw':2}
adj              {'slow':1}

我使用 groupby 方法完成并应用，但我没有得到我需要的东西

df2=df.groupby('Parts of speech')['word'].apply(lambda x : x.value_counts())

如何为每个词类创建一个元组？

Answer 1

一种方法是使用 .agg + collections.Counter:

进行聚合

from collections import Counter
df2=df.groupby('Parts of speech')['word'].agg(Counter)
print(df2)

输出

Parts of speech
Noun    {'cat': 2, 'water': 1}
adj                {'slow': 1}
verb               {'draw': 2}
Name: word, dtype: object

使用 value_counts 的替代方法（注意最后的 to_dict 调用）：

df2 = df.groupby('Parts of speech')['word'].agg(lambda x: x.value_counts().to_dict())

Pandas - 使用 groupby 创建最常用词的元组

Pandas - create a tuple of the most frequent words using groupby

python

group-by

apply

dataframe