根据多列标准比较 2 个数据集并查找缺失的行
Compare 2 datasets on multiple column criteria and find missing rows
我愿意比较两个数据集:
第一个:
Partner
Type
Power
Price
Partner1
Buy
1
15.975
Partner1
Buy
1
18.025
Partner1
Buy
1
18.025
Partner1
Buy
1
18.025
Partner1
Buy
1
18.025
Partner1
Sell
1000
43.5
第二个:
Partner
Type
Power
Price
Partner1
Buy
1
15.975
Partner1
Buy
1
18.025
Partner1
Buy
1
18.025
Partner1
Buy
1
18.025
Partner1
Buy
1
18.025
Partner1
Buy
1
18.025
Partner1
Buy
2
18.025
Partner1
Sell
5
19.05
Partner1
Sell
5
19.06
Partner1
Sell
5
19.125
Partner1
Buy
2
19.2
我的目标是检查第二个 table 中的哪些行不存在于第一个 table 中,基于列 'Type'、'Price' 和 'Power'.
compcol = ['Type','Power','Price']
missing = second[~second[compcol].isin(first[compcol].to_dict(
orient='list')).all(axis=1)]
上面的代码 return 缺少第一个 table:
中缺少的行
Partner
Type
Power
Price
Partner1
Buy
2
18.025
Partner1
Sell
5
19.05
Partner1
Sell
5
19.06
Partner1
Sell
5
19.125
Partner1
Buy
2
19.2
我想要实现的是包括另外一行“Partner1 - BUY - 1 - 18.025”,这在第一个 table 中也缺失(第二个 table 包含 5 条记录使用相同的数据进行交易,而第一个 table 仅包含 4 个!)。
我怎样才能做到这一点?
感谢您的回答。
您可以使用布尔掩码、isin()
方法、all()
方法和 Bitwise not operator(~)
:
col=['Type','Power','Price']
result=df2[~df2[col].isin(df1[col]).all(1)]
现在如果你打印 result
你会得到你想要的输出:
Partner Type Power Price
5 Partner1 Buy 1 18.025
6 Partner1 Buy 2 18.025
7 Partner1 Sell 5 19.050
8 Partner1 Sell 5 19.060
9 Partner1 Sell 5 19.125
10 Partner1 Buy 2 19.200
我愿意比较两个数据集:
第一个:
Partner | Type | Power | Price |
---|---|---|---|
Partner1 | Buy | 1 | 15.975 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Sell | 1000 | 43.5 |
第二个:
Partner | Type | Power | Price |
---|---|---|---|
Partner1 | Buy | 1 | 15.975 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Buy | 1 | 18.025 |
Partner1 | Buy | 2 | 18.025 |
Partner1 | Sell | 5 | 19.05 |
Partner1 | Sell | 5 | 19.06 |
Partner1 | Sell | 5 | 19.125 |
Partner1 | Buy | 2 | 19.2 |
我的目标是检查第二个 table 中的哪些行不存在于第一个 table 中,基于列 'Type'、'Price' 和 'Power'.
compcol = ['Type','Power','Price']
missing = second[~second[compcol].isin(first[compcol].to_dict(
orient='list')).all(axis=1)]
上面的代码 return 缺少第一个 table:
中缺少的行Partner | Type | Power | Price |
---|---|---|---|
Partner1 | Buy | 2 | 18.025 |
Partner1 | Sell | 5 | 19.05 |
Partner1 | Sell | 5 | 19.06 |
Partner1 | Sell | 5 | 19.125 |
Partner1 | Buy | 2 | 19.2 |
我想要实现的是包括另外一行“Partner1 - BUY - 1 - 18.025”,这在第一个 table 中也缺失(第二个 table 包含 5 条记录使用相同的数据进行交易,而第一个 table 仅包含 4 个!)。
我怎样才能做到这一点?
感谢您的回答。
您可以使用布尔掩码、isin()
方法、all()
方法和 Bitwise not operator(~)
:
col=['Type','Power','Price']
result=df2[~df2[col].isin(df1[col]).all(1)]
现在如果你打印 result
你会得到你想要的输出:
Partner Type Power Price
5 Partner1 Buy 1 18.025
6 Partner1 Buy 2 18.025
7 Partner1 Sell 5 19.050
8 Partner1 Sell 5 19.060
9 Partner1 Sell 5 19.125
10 Partner1 Buy 2 19.200