根据多列标准比较 2 个数据集并查找缺失的行

Question

我愿意比较两个数据集：

第一个：

Partner	Type	Power	Price
Partner1	Buy	1	15.975
Partner1	Buy	1	18.025
Partner1	Buy	1	18.025
Partner1	Buy	1	18.025
Partner1	Buy	1	18.025
Partner1	Sell	1000	43.5

第二个：

Partner	Type	Power	Price
Partner1	Buy	1	15.975
Partner1	Buy	1	18.025
Partner1	Buy	1	18.025
Partner1	Buy	1	18.025
Partner1	Buy	1	18.025
Partner1	Buy	1	18.025
Partner1	Buy	2	18.025
Partner1	Sell	5	19.05
Partner1	Sell	5	19.06
Partner1	Sell	5	19.125
Partner1	Buy	2	19.2

我的目标是检查第二个 table 中的哪些行不存在于第一个 table 中，基于列 'Type'、'Price' 和 'Power'.

compcol = ['Type','Power','Price']
missing = second[~second[compcol].isin(first[compcol].to_dict(
        orient='list')).all(axis=1)]

上面的代码 return 缺少第一个 table:

中缺少的行

Partner	Type	Power	Price
Partner1	Buy	2	18.025
Partner1	Sell	5	19.05
Partner1	Sell	5	19.06
Partner1	Sell	5	19.125
Partner1	Buy	2	19.2

我想要实现的是包括另外一行“Partner1 - BUY - 1 - 18.025”，这在第一个 table 中也缺失（第二个 table 包含 5 条记录使用相同的数据进行交易，而第一个 table 仅包含 4 个！）。

我怎样才能做到这一点？

感谢您的回答。

Answer 1

您可以使用布尔掩码、isin() 方法、all() 方法和 Bitwise not operator(~):

col=['Type','Power','Price']    
result=df2[~df2[col].isin(df1[col]).all(1)]

现在如果你打印 result 你会得到你想要的输出：

Partner     Type    Power   Price

5   Partner1    Buy     1   18.025
6   Partner1    Buy     2   18.025
7   Partner1    Sell    5   19.050
8   Partner1    Sell    5   19.060
9   Partner1    Sell    5   19.125
10  Partner1    Buy     2   19.200

根据多列标准比较 2 个数据集并查找缺失的行

Compare 2 datasets on multiple column criteria and find missing rows

python

compare

multiple-columns

pandas