使用 pandas concat 合并两个数据帧时如何删除重复项

Question

我有两个数据来自。

df1 列：id,x1,x2,x3,x4,....xn

df2 列：id,y。

df3 =pd.concat([df1,df2],axis=1)

当我使用pandas concat 将它们组合起来时，就变成了

id,y,id,x1,x2,x3...xn.

有两个id here.How我可以去掉一个吗

我试过了:

df3=pd.concat([df1,df2],axis=1).drop_duplicates().reset_index(drop=True).

但不起作用。

Answer 1

数据帧在索引上连接。确保 id 是连接前的索引：

df3 = pd.concat([df1.set_index('id'), 
                 df2.set_index('id')], axis=1).reset_index()

或者，更好的是，使用 join:

df3 = df1.join(df2, on='id')

Answer 2

drop_duplicates() 仅删除完全相同的行。

您要找的是pd.merge()。

pd.merge(df1, df2, on='id)

how to remove duplicates when using pandas concat to combine two dataframe