Python3 pandas:数据框按列分组(如姓名),然后为每组提取若干行

Python3 pandas: data frame grouped by a columns(such as name), then extract a number of rows for each group

有一个名为df的数据框如下:

name   id    age             text 
a      1     1    very good, and I like him
b      2     2    I play basketball with his brother
c      3     3    I hope to get a offer
d      4     4    everything goes well, I think
a      1     1    I will visit china
b      2     2    no one can understand me, I will solve it
c      3     3    I like followers
d      4     4    maybe I will be good
a      1     1    I should work hard to finish my research
b      2     2    water is the source of earth, I agree it
c      3     3    I hope you can keep in touch with me
d      4     4    My baby is very cute, I like him

数据框按名称分组,然后我想按行索引(例如:2)为新数据框提取一些行:df_new.

name   id    age             text 
a      1     1    very good, and I like him
a      1     1    I will visit china
b      2     2    I play basketball with his brother
b      2     2    no one can understand me, I will solve it
c      3     3    I hope to get a offer
c      3     3    I like followers
d      4     4    everything goes well, I think
d      4     4    maybe I will be good



  df_new = (df.groupby('screen_name'))[0:2]

但是出现错误:

   hash(key)
  TypeError: unhashable type: 'slice'

尝试改用 head()。

import pandas as pd
from io import StringIO

buff = StringIO('''
name,id,age,text
a,1,1,"very good, and I like him"
b,2,2,I play basketball with his brother
c,3,3,I hope to get a offer
d,4,4,"everything goes well, I think"
a,1,1,I will visit china
b,2,2,"no one can understand me, I will solve it"
c,3,3,I like followers
d,4,4,maybe I will be good
a,1,1,I should work hard to finish my research
b,2,2,"water is the source of earth, I agree it"
c,3,3,I hope you can keep in touch with me
d,4,4,"My baby is very cute, I like him"
''')
df = pd.read_csv(buff)

使用 head() 而不是 [:2] 然后按名称排序

df_new = df.groupby('name').head(2).sort_values('name')
print(df_new)
  name  id  age                                       text
0    a   1    1                  very good, and I like him
4    a   1    1                         I will visit china
1    b   2    2         I play basketball with his brother
5    b   2    2  no one can understand me, I will solve it
2    c   3    3                      I hope to get a offer
6    c   3    3                           I like followers
3    d   4    4              everything goes well, I think
7    d   4    4                       maybe I will be good

iloc的另一个解决方案:

df_new = df.groupby('name').apply(lambda x: x.iloc[:2]).reset_index(drop=True)
print(df_new)
  name  id  age                                       text
0    a   1    1                  very good, and I like him
1    a   1    1                         I will visit china
2    b   2    2         I play basketball with his brother
3    b   2    2  no one can understand me, I will solve it
4    c   3    3                      I hope to get a offer
5    c   3    3                           I like followers
6    d   4    4              everything goes well, I think
7    d   4    4                       maybe I will be good