如果其他数据框 df2 中存在列值，则从数据框 df1 中删除行

Question

我试过了：

res = df1[~(getattr(df1, 'A').isin(getattr(df2, 'A')) & getattr(df1, 'C').isin(getattr(df2, 'C')))]

有效但是在此示例中，列列表是可变的 columns = ['A', 'C'] 我如何遍历它来上面的表达式根据列表 'columns'

的值动态变化

exp: df1:

       A      B  C   D
0     oo    one  0   0
1    bar   one1  1   2
2    foo   two2  2   4
3    bar   one1  3   6
4    foo    two  4   8
5    bar    two  5  10
6    foo    one  6  12
7  fowwo  three  7  14

df2:

       A      B  C   D
0     oo    one  0   0
2    foo   two2  2   4
3    bar   one1  3   6
4    foo    two  4   8
5    bar    two  5  10
6    foo    one  6  12
7  fowwo  three  7  14

结果：

     A     B  C  D
1  bar  one1  1  2

Answer 1

使用：

column_list = ["A","C"]
df1[(~pd.concat((getattr(df1, col).isin(getattr(df2, col)) for col in column_list), axis=1 )).any(1)]

输出：

    A   B       C   D
1   bar one1    1   2

编辑

你在评论中说明的新情况可以用merge解决。

数据帧：

df3= pd.DataFrame({'A': '1010994595 1017165396 1020896102 1028915753 1028915753 1030811227 1033837508 1047224448 1047559040 1053827106 1094815936 1113339076 1115345471 1121416375 1122392586 1122981502 1132224809 '.split(), 'B': '99203 99232 99233 99231 99291 99291 99232 99232 99242 99232 99244 G0425 99213 99203 99606 99243 99214'.split(), 'C': np.arange(17), 'D': np.arange(17) * 2})
df4= pd.DataFrame({'A': '1115345471 1113339076 1020896102 1047224448 1053827106 1121416375 1122392586 1028915753 1132224809 1030811227 1094815936 1033837508 1047559040 1122981502 1028915753 1030811227 1017165396 '.split(), 'B': '99213 G0425 99291 99232 99291 99243 99606 99291 99214 99291 99244 99233 99242 99243 99291 99291 99232 '.split(), 'C': np.arange(17), 'D': np.arange(17) * 2})

df4 中不在 df3 中的 select 行的代码（对于 column_list 中的列）：

list_col = ["A","B"]
df4[df4.merge(df3.drop_duplicates(), on=list_col, how='left', indicator=True)["_merge"] == "left_only"]

输出：

    A           B       C   D
2   1020896102  99291   2   4
4   1053827106  99291   4   8
5   1121416375  99243   5   10
11  1033837508  99233   11  22

如果要为新 table 重置索引，请在末尾添加 .reset_index(drop=True)

Answer 2

答案是：

columns = ['A', 'B']
common_data_between_df1_and_df2_relative_to_columns = df1.merge(df2, on=columns , right_index=True)
res = df1[~(df1.index.isin(common_data_between_df1_and_df2 .index))].dropna()

如果其他数据框 df2 中存在列值，则从数据框 df1 中删除行

Remove rows from dataframe df1 if their columnS valueS exist in other dataframe df2

compare

duplicates

multiple-columns

dataframe

pandas