使用 pandas groupby 查找每个组内文本的平均长度
Using pandas groupby to find the average length of the text within each group
我正在使用莎士比亚语料库。
act literature_type scene scene_text scene_title speaker title
0 1 Comedy 1 In delivering my son from me, I bury a second ... Rousillon. The COUNT's palace. COUNTESS All's Well That Ends Well
1 1 Comedy 1 And I in going, madam, weep o'er my father's d... Rousillon. The COUNT's palace. BERTRAM All's Well That Ends Well
2 1 Comedy 1 You shall find of the king a husband, madam; y... Rousillon. The COUNT's palace. LAFEU All's Well That Ends Well
3 1 Comedy 1 What hope is there of his majesty's amendment? Rousillon. The COUNT's palace. COUNTESS All's Well That Ends Well
4 1 Comedy 1 He hath abandoned his physicians, madam; under... Rousillon. The COUNT's palace. LAFEU All's Well That Ends Well
我想找出每个标题的平均 scene_text
长度。
我想按照以下方式使用:
all_works_by_speaker_df.groupby('title').apply(lambda x: np.mean(len(x)))
这只是 returns 每个标题中的场景数。
如果需要len
个字符:
df = (all_works_by_speaker_df.groupby('title')['scene_text']
.apply(lambda x: np.mean(x.str.len()))
.reset_index(name='mean_len_text'))
print (df)
title mean_len_text
0 All's Well That Ends Well 48.4
如果需要len
s的话使用.
拆分、长度和平均值
df.groupby('title').scene_text.apply(lambda x: x.str.split().str.len().mean())
title
All's Well That Ends Well 9.2
从列中获取字符串的长度,然后将数组分组为您的播放标题,然后应用平均值。
mean_len = df.scene_text.str.len().groupby(df.title).mean()
我正在使用莎士比亚语料库。
act literature_type scene scene_text scene_title speaker title
0 1 Comedy 1 In delivering my son from me, I bury a second ... Rousillon. The COUNT's palace. COUNTESS All's Well That Ends Well
1 1 Comedy 1 And I in going, madam, weep o'er my father's d... Rousillon. The COUNT's palace. BERTRAM All's Well That Ends Well
2 1 Comedy 1 You shall find of the king a husband, madam; y... Rousillon. The COUNT's palace. LAFEU All's Well That Ends Well
3 1 Comedy 1 What hope is there of his majesty's amendment? Rousillon. The COUNT's palace. COUNTESS All's Well That Ends Well
4 1 Comedy 1 He hath abandoned his physicians, madam; under... Rousillon. The COUNT's palace. LAFEU All's Well That Ends Well
我想找出每个标题的平均 scene_text
长度。
我想按照以下方式使用:
all_works_by_speaker_df.groupby('title').apply(lambda x: np.mean(len(x)))
这只是 returns 每个标题中的场景数。
如果需要len
个字符:
df = (all_works_by_speaker_df.groupby('title')['scene_text']
.apply(lambda x: np.mean(x.str.len()))
.reset_index(name='mean_len_text'))
print (df)
title mean_len_text
0 All's Well That Ends Well 48.4
如果需要len
s的话使用
拆分、长度和平均值
df.groupby('title').scene_text.apply(lambda x: x.str.split().str.len().mean())
title
All's Well That Ends Well 9.2
从列中获取字符串的长度,然后将数组分组为您的播放标题,然后应用平均值。
mean_len = df.scene_text.str.len().groupby(df.title).mean()