根据另一个数据框中的标签以最小距离标记数据框中的一行 - Python

Question

如果有如下两个dataframes。

df1 =

    A           B           C        Label
    1.5        2            1.5        1
    2.5        3            2.5        2
    3.5        4            3.5        3

和 df2 =

    A           B           C     
    2          2            2       
    3          3            3        
    4          4            4       
    3          3            3

我想将标签添加到行与行之间的距离最小的每一行。那将是：

    A           B           C   Label  
    3          3            3      2
    2          2            2      1 
    4          4            4      3
    3          3            3      2

我尝试了什么：

final_label = []
final_label.append(min(distance.euclidean(df2.iloc[i,:],
df1.iloc[j,:]) for j in len(df1)  for i in len(df2))

注意：重要的是不要丢失 df2 的顺序。

Answer 1

您可以使用 numpy 的广播来计算每对之间的欧氏距离，并找到 argmin 的最小值。

v = ((df1.iloc[:, :-1].values[:, None] - df2.values) ** 2).sum(-1).argmin(0)
df2.assign(Label=df1.Label.iloc[v].values)

   A  B  C  Label
0  2  2  2      1
1  3  3  3      2
2  4  4  4      3
3  3  3  3      2

Answer 2

scipy.spatial.distance.cdist + np.where

ary=scipy.spatial.distance.cdist(df1[['A', 'B', 'C']], df2[['A', 'B', 'C']], metric='euclidean')
order=np.where(ary==ary.min(1)[:,None])
df2['New']=df1.reindex(order[1]).Label.values

df2
Out[612]: 
   A  B  C  Label  New
0  3  3  3      2    2
1  2  2  2      1    1
2  4  4  4      3    3

编辑：通过使用 cold 的 argmin()

ary = scipy.spatial.distance.cdist(df2[['A', 'B', 'C']], df1[['A', 'B', 'C']], metric='euclidean')

df2['New']=df1.reindex(ary.argmin(1)).Label.values


df2
Out[659]: 
   A  B  C  Label  New
0  3  3  3      2    2
1  2  2  2      1    1
2  4  4  4      3    3
3  3  3  3      3    2

根据另一个数据框中的标签以最小距离标记数据框中的一行 - Python

Labeling a row in a dataframe according to a label in another dataframe with minimum distance - Python

python

distance

dataframe

pandas