Python Pandas 计算两列的 value_counts 并使用 groupby
Python Pandas calculate value_counts of two columns and use groupby
我有一个数据框:
data = {'label': ['cat','dog','dog','cat','cat'],
'breeds': [ 'bengal','shar pei','pug','maine coon','maine coon'],
'nicknames':[['Loki','Loki' ],['Max'],['Toby','Zeus ','Toby'],['Marty'],['Erin ','Erin']],
'eye color':[['blue','green'],['green'],['brown','brown','brown'],['blue'],['green','brown']]
输出:
label breeds nicknames eye color
0 cat bengal [Loki,Loki] [blue, green]
1 dog shar pei [Max] [green]
2 dog pug [Toby,Zeus,Toby] [brown, brown, brown]
3 cat maine coon [Marty] [blue]
4 cat maine coon [Erin,Erin] [green, brown]
我想应用 groupby :frame['label', 'breeds'], 并计算昵称的 value_counts(唯一值)和眼睛颜色,但在不同的列中输出它们:'nickname_count','eye_count'
这段代码只输出一列,如何分别输出?
frame2=frame.groupby(['label','breeds'])['nicknames','eye color'].apply(lambda x: x.astype('str').value_counts().to_dict())
首先,我们在列表中使用 groupby
和 sum
,因为 sum
将列表连接在一起:
>>> df_grouped = df.groupby(['label', 'breeds']).agg({'nicknames': sum, 'eye color': sum}).reset_index()
>>> df_grouped
label breeds nicknames eye color
0 cat bengal [Loki, Loki] [blue, green]
1 cat maine coon [Marty, Erin , Erin] [blue, green, brown]
2 dog pug [Toby, Zeus , Toby] [brown, brown, brown]
3 dog shar pei [Max] [green]
然后,我们可以通过将列表转换为集合来计算列表中唯一值的数量,使用 len
并将输出保存在两个新列中以获得预期结果:
>>> df_grouped['nickname_count'] = df_grouped['nicknames'].apply(lambda x: list(set(x))).str.len()
>>> df_grouped['eye_count'] = df_grouped['eye color'].apply(lambda x: list(set(x))).str.len()
>>> df_grouped
label breeds nicknames eye color nickname_count eye_count
0 cat bengal [Loki, Loki] [blue, green] 1 2
1 cat maine coon [Marty, Erin , Erin] [blue, green, brown] 3 3
2 dog pug [Toby, Zeus , Toby] [brown, brown, brown] 2 1
3 dog shar pei [Max] [green] 1 1
我有一个数据框:
data = {'label': ['cat','dog','dog','cat','cat'],
'breeds': [ 'bengal','shar pei','pug','maine coon','maine coon'],
'nicknames':[['Loki','Loki' ],['Max'],['Toby','Zeus ','Toby'],['Marty'],['Erin ','Erin']],
'eye color':[['blue','green'],['green'],['brown','brown','brown'],['blue'],['green','brown']]
输出:
label breeds nicknames eye color
0 cat bengal [Loki,Loki] [blue, green]
1 dog shar pei [Max] [green]
2 dog pug [Toby,Zeus,Toby] [brown, brown, brown]
3 cat maine coon [Marty] [blue]
4 cat maine coon [Erin,Erin] [green, brown]
我想应用 groupby :frame['label', 'breeds'], 并计算昵称的 value_counts(唯一值)和眼睛颜色,但在不同的列中输出它们:'nickname_count','eye_count' 这段代码只输出一列,如何分别输出?
frame2=frame.groupby(['label','breeds'])['nicknames','eye color'].apply(lambda x: x.astype('str').value_counts().to_dict())
首先,我们在列表中使用 groupby
和 sum
,因为 sum
将列表连接在一起:
>>> df_grouped = df.groupby(['label', 'breeds']).agg({'nicknames': sum, 'eye color': sum}).reset_index()
>>> df_grouped
label breeds nicknames eye color
0 cat bengal [Loki, Loki] [blue, green]
1 cat maine coon [Marty, Erin , Erin] [blue, green, brown]
2 dog pug [Toby, Zeus , Toby] [brown, brown, brown]
3 dog shar pei [Max] [green]
然后,我们可以通过将列表转换为集合来计算列表中唯一值的数量,使用 len
并将输出保存在两个新列中以获得预期结果:
>>> df_grouped['nickname_count'] = df_grouped['nicknames'].apply(lambda x: list(set(x))).str.len()
>>> df_grouped['eye_count'] = df_grouped['eye color'].apply(lambda x: list(set(x))).str.len()
>>> df_grouped
label breeds nicknames eye color nickname_count eye_count
0 cat bengal [Loki, Loki] [blue, green] 1 2
1 cat maine coon [Marty, Erin , Erin] [blue, green, brown] 3 3
2 dog pug [Toby, Zeus , Toby] [brown, brown, brown] 2 1
3 dog shar pei [Max] [green] 1 1