根据多列标准比较 2 个数据集并查找缺失的行

Compare 2 datasets on multiple column criteria and find missing rows

我愿意比较两个数据集:

第一个:

Partner Type Power Price
Partner1 Buy 1 15.975
Partner1 Buy 1 18.025
Partner1 Buy 1 18.025
Partner1 Buy 1 18.025
Partner1 Buy 1 18.025
Partner1 Sell 1000 43.5

第二个:

Partner Type Power Price
Partner1 Buy 1 15.975
Partner1 Buy 1 18.025
Partner1 Buy 1 18.025
Partner1 Buy 1 18.025
Partner1 Buy 1 18.025
Partner1 Buy 1 18.025
Partner1 Buy 2 18.025
Partner1 Sell 5 19.05
Partner1 Sell 5 19.06
Partner1 Sell 5 19.125
Partner1 Buy 2 19.2

我的目标是检查第二个 table 中的哪些行不存在于第一个 table 中,基于列 'Type'、'Price' 和 'Power'.

compcol = ['Type','Power','Price']
missing = second[~second[compcol].isin(first[compcol].to_dict(
        orient='list')).all(axis=1)]

上面的代码 return 缺少第一个 table:

中缺少的行
Partner Type Power Price
Partner1 Buy 2 18.025
Partner1 Sell 5 19.05
Partner1 Sell 5 19.06
Partner1 Sell 5 19.125
Partner1 Buy 2 19.2

我想要实现的是包括另外一行“Partner1 - BUY - 1 - 18.025”,这在第一个 table 中也缺失(第二个 table 包含 5 条记录使用相同的数据进行交易,而第一个 table 仅包含 4 个!)。

我怎样才能做到这一点?

感谢您的回答。

您可以使用布尔掩码、isin() 方法、all() 方法和 Bitwise not operator(~):

col=['Type','Power','Price']    
result=df2[~df2[col].isin(df1[col]).all(1)]

现在如果你打印 result 你会得到你想要的输出:

Partner     Type    Power   Price

5   Partner1    Buy     1   18.025
6   Partner1    Buy     2   18.025
7   Partner1    Sell    5   19.050
8   Partner1    Sell    5   19.060
9   Partner1    Sell    5   19.125
10  Partner1    Buy     2   19.200