Pandas:将数据帧的内容合并到一个列中(作为字典列表/json)

Pandas: Merge contents of a dataframe into a single column (as a list of dict / json)

我想将一个df的内容介绍给另一个但是作为一个基于ID的列表。我知道要根据 ID 进行合并,但我不想在新数据框中为 ID 重复行。我该如何完成?

data1 = {'ID': ['AB01','AB02'], 
    'Name': ["toyota", "honda"],
    'Age':[21,22]
   }
df1 = pd.DataFrame.from_dict(data1)
data2 = {'ID': ['AB01','AB01','AB03','AB03'], 
    'Type': ["C",np.nan,"X","S"],
    'Score':[87,98,45,82]
   }
df2 = pd.DataFrame.from_dict(data2)

结果应该是这样的

尝试 merge:

print(df1.merge(df2, on='ID', how='left').groupby(['ID', 'Name', 'Age']).apply(lambda x: a.to_dict('records') if (a:=x[['ID']].join(x.iloc[:, 3:])).dropna().any().any() else []).reset_index(name='Info'))

输出:

     ID    Name  Age                                               Info
0  AB01  toyota   21  [{'ID': 'AB01', 'Type': 'C', 'Score': 87.0}, {...
1  AB02   honda   22                                                 []

您可以通过 .apply(), then group by ID and aggregate the dict of same ID into list by .groupby() + .agg()df2 的行上创建 dict

然后,用left join with ID作为匹配键,与df1.merge()合并,如下:

df2_info = (df2.apply(dict, axis=1)
               .groupby(df2['ID'])
               .agg(list)
               .reset_index(name='Info')
           )

df_out = df1.merge(df2_info, on='ID', how='left')

结果

print(df_out)

     ID    Name  Age                                                                                  Info
0  AB01  toyota   21  [{'ID': 'AB01', 'Type': 'C', 'Score': 87}, {'ID': 'AB01', 'Type': nan, 'Score': 98}]
1  AB02   honda   22                                                                                   NaN

仅供参考,df2_info的中期结果:

     ID                                                                                  Info
0  AB01  [{'ID': 'AB01', 'Type': 'C', 'Score': 87}, {'ID': 'AB01', 'Type': nan, 'Score': 98}]
1  AB03  [{'ID': 'AB03', 'Type': 'X', 'Score': 45}, {'ID': 'AB03', 'Type': 'S', 'Score': 82}]