检查列中的匹配值 - python/pandas
Check for matching values in columns- python/pandas
我有一个 CSV 文件,其中包含两个模型(模型 1 和模型 2)的经纬度值,如下所示。我正在努力实现以下目标
- 在
model 2
列 Lat/ Long
值的每一行中搜索 model 1
的第一行 Lat/ Long
值。如果在模型 2 中找到模型 1 的纬度/经度值,则在新列中打印它们各自的区域名称,例如 'Papa'。对模型 1 中的其余行重复该过程,然后对模型 2 进行重复。
- 如果模型 1 的
Lat/Long
值与模型 2 不匹配,则在输出中打印 NOT matched
。``
最小工作示例:
import pandas as pd
df3= pd.read_csv("compare.csv")
Output=df3.loc[:, df3.columns.isin(list('LatLong'))]
已搜索现有答案(), ()但找不到解决方案。
sample_data.csv
Model 1 Model 2
Lat Lon Name X Y
-33.348652 138.751659 Kastan -41.735983 145.532112
-41.735983 145.532112 Ldon -37.222005 145.921452
-37.222005 145.921452 Papa -33.348652 138.751659
-37.222005 145.921452 tine -34.779284 138.522352
-37.222005 145.921452 Farm -31.543177 118.4565685
-27.112811 150.904878 Loy -38.2536 146.574569
使用pandas.merge
的一种方式:
df["Output"] = df["Model 1"].merge(df["Model 2"],
how="left",
left_on=["Lat", "Lon"],
right_on=["X", "Y"],
)["Name"].fillna("NOT MATCHED")
输出:
Model 1 Model 2 Output
Lat Lon Name X Y
0 -33.348652 138.751659 Bastyan -41.735983 145.532112 Papa
1 -41.735983 145.532112 Eildon -37.222005 145.921452 Bastyan
2 -37.222005 145.921452 Papa -33.348652 138.751659 Eildon
3 -37.222005 145.921452 Quar -34.779284 138.522352 Eildon
4 -37.222005 145.921452 Coll -31.543177 118.456569 Eildon
5 -27.112811 150.904878 Loy -38.253600 146.574569 NOT MATCHED
使用的示例数据:
from io import StringIO
data = """Model 1,Model 1,Model 2,Model 2,Model 2
Lat,Lon,Name,X,Y
-33.348652,138.751659,Bastyan,-41.735983,145.532112
-41.735983,145.532112,Eildon,-37.222005,145.921452
-37.222005,145.921452,Papa,-33.348652,138.751659
-37.222005,145.921452,Quar,-34.779284,138.522352
-37.222005,145.921452,Coll,-31.543177,118.4565685
-27.112811,150.904878,Loy,-38.2536,146.574569"""
df = pd.read_csv(StringIO(data), sep=",", header=[0,1])
另一种方式是
#删除多级列并交叉合并
s=df.droplevel(level=0, axis=1).merge(df.droplevel(level=0, axis=1), how='cross', suffixes=('','_y'))
#过滤匹配项row_wise并重命名列
#s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$', axis=1).rename(columns=lambda x: x.split('_')[0])
s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$|Name', axis=1).rename(columns={'Name_y':'Output'}).rename(columns=lambda x: x.split('_')[0])
结果
Name Lat Lon Output X Y
2 Bastyan -37.222005 145.921452 Papa -33.348652 138.751659
6 Eildon -33.348652 138.751659 Bastyan -41.735983 145.532112
13 Papa -41.735983 145.532112 Eildon -37.222005 145.921452
19 Quar -41.735983 145.532112 Eildon -37.222005 145.921452
25 Coll -41.735983 145.532112 Eildon -37.222005 145.921452
试试这个
import pandas as pd
df3 = pd.read_csv("compare.csv", header=[1])
df3['Output'] = df3[['Lat', 'Lon']].merge(df3[['Name', 'X','Y']],
how="left",
left_on=["Lat", "Lon"],
right_on=["X", "Y"])["Name"].fillna("NOT MATCHED")
cols = pd.MultiIndex.from_product([["Model 1"], df3.columns[:2]])
cols = cols.append(pd.MultiIndex.from_product([["Model 2"], df3.columns[2:]]))
df3.columns = cols
print(df3)
输出
Model 1 Model 2
Lat Lon Name X Y Output
0 -33.348652 138.751659 Bastyan -41.735983 145.532112 Papa
1 -41.735983 145.532112 Eildon -37.222005 145.921452 Bastyan
2 -37.222005 145.921452 Papa -33.348652 138.751659 Eildon
3 -37.222005 145.921452 Quar -34.779284 138.522352 Eildon
4 -37.222005 145.921452 Coll -31.543177 118.456569 Eildon
5 -27.112811 150.904878 Loy -38.253600 146.574569 NOT MATCHED
我有一个 CSV 文件,其中包含两个模型(模型 1 和模型 2)的经纬度值,如下所示。我正在努力实现以下目标
- 在
model 2
列Lat/ Long
值的每一行中搜索model 1
的第一行Lat/ Long
值。如果在模型 2 中找到模型 1 的纬度/经度值,则在新列中打印它们各自的区域名称,例如 'Papa'。对模型 1 中的其余行重复该过程,然后对模型 2 进行重复。 - 如果模型 1 的
Lat/Long
值与模型 2 不匹配,则在输出中打印NOT matched
。``
最小工作示例:
import pandas as pd
df3= pd.read_csv("compare.csv")
Output=df3.loc[:, df3.columns.isin(list('LatLong'))]
已搜索现有答案(
sample_data.csv
Model 1 Model 2
Lat Lon Name X Y
-33.348652 138.751659 Kastan -41.735983 145.532112
-41.735983 145.532112 Ldon -37.222005 145.921452
-37.222005 145.921452 Papa -33.348652 138.751659
-37.222005 145.921452 tine -34.779284 138.522352
-37.222005 145.921452 Farm -31.543177 118.4565685
-27.112811 150.904878 Loy -38.2536 146.574569
使用pandas.merge
的一种方式:
df["Output"] = df["Model 1"].merge(df["Model 2"],
how="left",
left_on=["Lat", "Lon"],
right_on=["X", "Y"],
)["Name"].fillna("NOT MATCHED")
输出:
Model 1 Model 2 Output
Lat Lon Name X Y
0 -33.348652 138.751659 Bastyan -41.735983 145.532112 Papa
1 -41.735983 145.532112 Eildon -37.222005 145.921452 Bastyan
2 -37.222005 145.921452 Papa -33.348652 138.751659 Eildon
3 -37.222005 145.921452 Quar -34.779284 138.522352 Eildon
4 -37.222005 145.921452 Coll -31.543177 118.456569 Eildon
5 -27.112811 150.904878 Loy -38.253600 146.574569 NOT MATCHED
使用的示例数据:
from io import StringIO
data = """Model 1,Model 1,Model 2,Model 2,Model 2
Lat,Lon,Name,X,Y
-33.348652,138.751659,Bastyan,-41.735983,145.532112
-41.735983,145.532112,Eildon,-37.222005,145.921452
-37.222005,145.921452,Papa,-33.348652,138.751659
-37.222005,145.921452,Quar,-34.779284,138.522352
-37.222005,145.921452,Coll,-31.543177,118.4565685
-27.112811,150.904878,Loy,-38.2536,146.574569"""
df = pd.read_csv(StringIO(data), sep=",", header=[0,1])
另一种方式是
#删除多级列并交叉合并
s=df.droplevel(level=0, axis=1).merge(df.droplevel(level=0, axis=1), how='cross', suffixes=('','_y'))
#过滤匹配项row_wise并重命名列
#s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$', axis=1).rename(columns=lambda x: x.split('_')[0])
s[((s['Y_y']==s['Lon'])|(s['X_y']==s['Lat']))].filter(regex='_y$|Name', axis=1).rename(columns={'Name_y':'Output'}).rename(columns=lambda x: x.split('_')[0])
结果
Name Lat Lon Output X Y
2 Bastyan -37.222005 145.921452 Papa -33.348652 138.751659
6 Eildon -33.348652 138.751659 Bastyan -41.735983 145.532112
13 Papa -41.735983 145.532112 Eildon -37.222005 145.921452
19 Quar -41.735983 145.532112 Eildon -37.222005 145.921452
25 Coll -41.735983 145.532112 Eildon -37.222005 145.921452
试试这个
import pandas as pd
df3 = pd.read_csv("compare.csv", header=[1])
df3['Output'] = df3[['Lat', 'Lon']].merge(df3[['Name', 'X','Y']],
how="left",
left_on=["Lat", "Lon"],
right_on=["X", "Y"])["Name"].fillna("NOT MATCHED")
cols = pd.MultiIndex.from_product([["Model 1"], df3.columns[:2]])
cols = cols.append(pd.MultiIndex.from_product([["Model 2"], df3.columns[2:]]))
df3.columns = cols
print(df3)
输出
Model 1 Model 2
Lat Lon Name X Y Output
0 -33.348652 138.751659 Bastyan -41.735983 145.532112 Papa
1 -41.735983 145.532112 Eildon -37.222005 145.921452 Bastyan
2 -37.222005 145.921452 Papa -33.348652 138.751659 Eildon
3 -37.222005 145.921452 Quar -34.779284 138.522352 Eildon
4 -37.222005 145.921452 Coll -31.543177 118.456569 Eildon
5 -27.112811 150.904878 Loy -38.253600 146.574569 NOT MATCHED