如何在 pandas 中的数据帧上使用 for 循环计算每个唯一值的 Levenshtein 距离

Question

我正在尝试使用 for 循环计算数据框中的 Levenshtein 距离。

df2_2=df2_1[['Concat','Count','ffour']].copy()
for a in df2_2['Concat'].unique():
    dw2_2=df2_2[df2_2['Concat']==a]
    vv = dw2_2.iloc[:, 1::2].values
    iRow, iCol = np.unravel_index(vv.argmax(), vv.shape)
    iCol = iCol * 2 + 1
    result = dw2_2.iloc[iRow, [0, iCol, iCol + 1]]
    b=result.copy()
    b=b.drop(labels=['Concat','Count'])
    print (b)
    b=b.astype(str)
    for a1 in df2_2['ffour'].unique():
        dw2_1=df2_2[df2_2['ffour']==a1]
        c= dw2_1['ffour'].copy()
        print (c)
        c=c.astype(str)
        for i in range (len(b)):
            distance=lev.distance(b,c)
            print (distance)
            ratio=lev.ratio(b,c)
            print (ratio)

我在这方面遇到了错误

  File "<ipython-input-129-15900bf3d493>", line 17, in <module>
    distance=lev.distance(b,c)

TypeError: distance expected two Strings or two Unicodes

需要这方面的帮助。

Answer 1

我会建议您检查 b 和 c 的值。您始终可以只使用 str(b) 和 str(c)，它可能会成功。
像那样：

distance=lev.distance(str(b),str(c))

或者您可以只对列 Concat 中的所有值应用 str() 以确保您只有字符串：

df2_2['Concat'] = df2_2['Concat'].map(lambda x: str(x))

如何在 pandas 中的数据帧上使用 for 循环计算每个唯一值的 Levenshtein 距离

How to calculate Levenshtein distance for every unique value using a for loop on a dataframe in pandas

python

levenshtein-distance

pandas