比较两列,一列通过浮点数,另一列通过字符串来获得匹配值
Compare two columns, one by float and other by strings to get matching values
我有两个数据框,其中有两列非常重要。其中一列由 float64 值组成,另一列是字符串。
数据帧的大小不同。
我想同时匹配 Number
列和 Item
列,然后只获得匹配的那些。
df1 = pd.DataFrame({ 'Number':[1.0,3.0,4.0,5.0,8.0,12.0,32.0,58.0] , 'Item': ['Phone', 'Watch', 'Pen', 'Pencil', 'Pencil','toolkit','box','fork']})
df2 = pd.DataFrame({'Number':[3.0,4.0,8.0,12.0,15.0,32.0,54.0,58.0,72.0], 'Item':['Watch','Pen','Pencil','Eraser','bottle','box','toolkit','fork','Phone']})
df1
Number Item
0 1.0 Phone
1 3.0 Watch
2 4.0 Pen
3 5.0 Pencil
4 8.0 Pencil
5 12.0 toolkit
6 32.0 box
7 58.0 fork
df2
Number Item
0 3.0 Watch
1 4.0 Pen
2 8.0 Pencil
3 12.0 Eraser
4 15.0 bottle
5 32.0 box
6 54.0 toolkit
7 58.0 fork
8 72.0 Phone
我正在尝试使用 forloop,但循环很长。这似乎是实现这一目标的非常糟糕的方法。我正在尝试使用掩码操作但不确定如何实现。感谢帮助以最短的方式执行此操作。
所需的结果应如下所示:
Item Matching Number
0 Phone No Match 1.0
1 Watch Matched 3.0
2 Pen Matched 4.0
3 Pencil No Match 5.0
4 Pencil Matched 8.0
5 toolkit No Match 12.0
6 box Matched 32.0
7 fork Matched 58.0
您正在寻找与 indicator=True
的左合并:
res = pd.merge(df1, df2, how='left', indicator=True)
print(res)
Item Number _merge
0 Phone 1.0 left_only
1 Watch 3.0 both
2 Pen 4.0 both
3 Pencil 5.0 left_only
4 Pencil 8.0 both
5 toolkit 12.0 left_only
6 box 32.0 both
7 fork 58.0 both
一般来说,当有专门构建的方法可用时,请避免显式 for
循环,因为这些方法通常针对性能进行了优化。如果愿意,您可以通过字典映射替换字符串:
d = {'left_only': 'No Match', 'both': 'Matched'}
df['_merge'] = df['_merge'].map(d)
如果问题合并浮点值,可以乘以 1000
并转换为整数,然后 merge
使用左连接,因为匹配应该有问题,baciuse 浮点精度应该不同两列:
df1['Number1'] = df1['Number'].mul(1000).astype(int)
df2['Number1'] = df2['Number'].mul(1000).astype(int)
df = pd.merge(df1, df2.drop('Number', 1), how='left', on=['Item','Number1'], indicator=True)
df['Matching'] = df['_merge'].map({'left_only':'No Match', 'both':'Match'})
df = df.drop(['Number1','_merge'], axis=1)
print (df)
Number Item Matching
0 1.0 Phone No Match
1 3.0 Watch Match
2 4.0 Pen Match
3 5.0 Pencil No Match
4 8.0 Pencil Match
5 12.0 toolkit No Match
6 32.0 box Match
7 58.0 fork Match
您可以通过一些简单的 loc
和 isin
来找到您需要的数据框,如下所示
df = df1.copy()
df['Matching'] = np.nan
df.loc[(df.Number.isin(df2.Number)) & (df.Item.isin(df2.Item)), 'Matching'] = 'Matched'
df.Matching.fillna('No Match', inplace=True)
Number Item Matching
1.0 Phone No Match
3.0 Watch Matched
4.0 Pen Matched
5.0 Pencil No Match
8.0 Pencil Matched
12.0 toolkit Matched
32.0 box Matched
58.0 fork Matched
我有两个数据框,其中有两列非常重要。其中一列由 float64 值组成,另一列是字符串。 数据帧的大小不同。
我想同时匹配 Number
列和 Item
列,然后只获得匹配的那些。
df1 = pd.DataFrame({ 'Number':[1.0,3.0,4.0,5.0,8.0,12.0,32.0,58.0] , 'Item': ['Phone', 'Watch', 'Pen', 'Pencil', 'Pencil','toolkit','box','fork']})
df2 = pd.DataFrame({'Number':[3.0,4.0,8.0,12.0,15.0,32.0,54.0,58.0,72.0], 'Item':['Watch','Pen','Pencil','Eraser','bottle','box','toolkit','fork','Phone']})
df1
Number Item
0 1.0 Phone
1 3.0 Watch
2 4.0 Pen
3 5.0 Pencil
4 8.0 Pencil
5 12.0 toolkit
6 32.0 box
7 58.0 fork
df2
Number Item
0 3.0 Watch
1 4.0 Pen
2 8.0 Pencil
3 12.0 Eraser
4 15.0 bottle
5 32.0 box
6 54.0 toolkit
7 58.0 fork
8 72.0 Phone
我正在尝试使用 forloop,但循环很长。这似乎是实现这一目标的非常糟糕的方法。我正在尝试使用掩码操作但不确定如何实现。感谢帮助以最短的方式执行此操作。
所需的结果应如下所示:
Item Matching Number
0 Phone No Match 1.0
1 Watch Matched 3.0
2 Pen Matched 4.0
3 Pencil No Match 5.0
4 Pencil Matched 8.0
5 toolkit No Match 12.0
6 box Matched 32.0
7 fork Matched 58.0
您正在寻找与 indicator=True
的左合并:
res = pd.merge(df1, df2, how='left', indicator=True)
print(res)
Item Number _merge
0 Phone 1.0 left_only
1 Watch 3.0 both
2 Pen 4.0 both
3 Pencil 5.0 left_only
4 Pencil 8.0 both
5 toolkit 12.0 left_only
6 box 32.0 both
7 fork 58.0 both
一般来说,当有专门构建的方法可用时,请避免显式 for
循环,因为这些方法通常针对性能进行了优化。如果愿意,您可以通过字典映射替换字符串:
d = {'left_only': 'No Match', 'both': 'Matched'}
df['_merge'] = df['_merge'].map(d)
如果问题合并浮点值,可以乘以 1000
并转换为整数,然后 merge
使用左连接,因为匹配应该有问题,baciuse 浮点精度应该不同两列:
df1['Number1'] = df1['Number'].mul(1000).astype(int)
df2['Number1'] = df2['Number'].mul(1000).astype(int)
df = pd.merge(df1, df2.drop('Number', 1), how='left', on=['Item','Number1'], indicator=True)
df['Matching'] = df['_merge'].map({'left_only':'No Match', 'both':'Match'})
df = df.drop(['Number1','_merge'], axis=1)
print (df)
Number Item Matching
0 1.0 Phone No Match
1 3.0 Watch Match
2 4.0 Pen Match
3 5.0 Pencil No Match
4 8.0 Pencil Match
5 12.0 toolkit No Match
6 32.0 box Match
7 58.0 fork Match
您可以通过一些简单的 loc
和 isin
来找到您需要的数据框,如下所示
df = df1.copy()
df['Matching'] = np.nan
df.loc[(df.Number.isin(df2.Number)) & (df.Item.isin(df2.Item)), 'Matching'] = 'Matched'
df.Matching.fillna('No Match', inplace=True)
Number Item Matching
1.0 Phone No Match
3.0 Watch Matched
4.0 Pen Matched
5.0 Pencil No Match
8.0 Pencil Matched
12.0 toolkit Matched
32.0 box Matched
58.0 fork Matched