检查列中的匹配值 - python/pandas

Check for matching values in columns- python/pandas

我有一个 CSV 文件,其中包含两个模型(模型 1 和模型 2)的经纬度值,如下所示。我正在努力实现以下目标

最小工作示例:

import pandas as pd

df3= pd.read_csv("compare.csv")
Output=df3.loc[:, df3.columns.isin(list('LatLong'))]

已搜索现有答案(), ()但找不到解决方案。

sample_data.csv

Model 1                         Model 2     
Lat            Lon        Name        X        Y
-33.348652  138.751659  Kastan  -41.735983  145.532112
-41.735983  145.532112  Ldon    -37.222005  145.921452
-37.222005  145.921452  Papa    -33.348652  138.751659
-37.222005  145.921452  tine    -34.779284  138.522352
-37.222005  145.921452  Farm    -31.543177  118.4565685
-27.112811  150.904878  Loy     -38.2536    146.574569

使用pandas.merge的一种方式:

df["Output"] = df["Model 1"].merge(df["Model 2"], 
                                   how="left",
                                   left_on=["Lat", "Lon"], 
                                   right_on=["X", "Y"],
                                  )["Name"].fillna("NOT MATCHED")

输出:

     Model 1              Model 2                              Output
         Lat         Lon     Name          X           Y             
0 -33.348652  138.751659  Bastyan -41.735983  145.532112         Papa
1 -41.735983  145.532112   Eildon -37.222005  145.921452      Bastyan
2 -37.222005  145.921452     Papa -33.348652  138.751659       Eildon
3 -37.222005  145.921452     Quar -34.779284  138.522352       Eildon
4 -37.222005  145.921452     Coll -31.543177  118.456569       Eildon
5 -27.112811  150.904878      Loy -38.253600  146.574569  NOT MATCHED

使用的示例数据:

from io import StringIO

data = """Model 1,Model 1,Model 2,Model 2,Model 2
Lat,Lon,Name,X,Y
-33.348652,138.751659,Bastyan,-41.735983,145.532112
-41.735983,145.532112,Eildon,-37.222005,145.921452
-37.222005,145.921452,Papa,-33.348652,138.751659
-37.222005,145.921452,Quar,-34.779284,138.522352
-37.222005,145.921452,Coll,-31.543177,118.4565685
-27.112811,150.904878,Loy,-38.2536,146.574569"""

df = pd.read_csv(StringIO(data), sep=",", header=[0,1])

另一种方式是

#删除多级列并交叉合并

 s=df.droplevel(level=0, axis=1).merge(df.droplevel(level=0, axis=1), how='cross', suffixes=('','_y'))

#过滤匹配项row_wise并重命名列

#s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$', axis=1).rename(columns=lambda x: x.split('_')[0])
s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$|Name', axis=1).rename(columns={'Name_y':'Output'}).rename(columns=lambda x: x.split('_')[0])

结果

      Name        Lat         Lon   Output          X           Y
2   Bastyan -37.222005  145.921452     Papa -33.348652  138.751659
6    Eildon -33.348652  138.751659  Bastyan -41.735983  145.532112
13     Papa -41.735983  145.532112   Eildon -37.222005  145.921452
19     Quar -41.735983  145.532112   Eildon -37.222005  145.921452
25     Coll -41.735983  145.532112   Eildon -37.222005  145.921452

试试这个

import pandas as pd

df3 = pd.read_csv("compare.csv", header=[1])

df3['Output'] = df3[['Lat', 'Lon']].merge(df3[['Name', 'X','Y']], 
                                          how="left",
                                          left_on=["Lat", "Lon"], 
                                          right_on=["X", "Y"])["Name"].fillna("NOT MATCHED")

cols = pd.MultiIndex.from_product([["Model 1"], df3.columns[:2]])
cols = cols.append(pd.MultiIndex.from_product([["Model 2"], df3.columns[2:]]))

df3.columns = cols

print(df3)

输出

    Model 1                 Model 2
    Lat         Lon         Name    X           Y           Output
0   -33.348652  138.751659  Bastyan -41.735983  145.532112  Papa
1   -41.735983  145.532112  Eildon  -37.222005  145.921452  Bastyan
2   -37.222005  145.921452  Papa    -33.348652  138.751659  Eildon
3   -37.222005  145.921452  Quar    -34.779284  138.522352  Eildon
4   -37.222005  145.921452  Coll    -31.543177  118.456569  Eildon
5   -27.112811  150.904878  Loy     -38.253600  146.574569  NOT MATCHED