改革 pandas 数据框

Reform pandas dataframe

我有一个数据框:

df1 = pandas.DataFrame( { 
    "text" : ["Alice is in ", "Alice is in wonderland.", "Mallory has done the task.", "Mallory has", "Bob is final." , "Mallory has done"] , 
    "label" : ["Seattle", "Portlang", "Gotland", "california", "california", "Portland"] ,
    "title":["SA","SA","sometitle","sometitle","some different title","sometitle"],
    "version":[1,2,4,1,2,3]})

df1
              text                  label           title            version
    0   Alice is in                Seattle         SA                   1 
    1   Alice is in wonderland.    Portlang        SA                   2
    2   Mallory has done the task. Portland       sometitle             4
    3   Mallory has                california     sometitle             1
    4   Bob is final.              california     some different title  2
    5   Mallory has done            Portland       sometitle            3

我想保留与最新版本号相对应的标题和文本,还想将标签保留在列表中。

非常感谢,

df.mergeGroupby.agg 一起使用:

In [508]: x = df1.groupby(['title']).agg({'version':'max', 'label':list})

In [516]: df1[['title', 'version', 'text']].merge(x, on=['title', 'version'])
Out[516]: 
                  title  version                        text                            label
0                    SA        2     Alice is in wonderland.              [Seattle, Portlang]
1             sometitle        4  Mallory has done the task.  [Gotland, california, Portland]
2  some different title        2               Bob is final.                     [california]