如果其他数据框 df2 中存在列值,则从数据框 df1 中删除行
Remove rows from dataframe df1 if their columnS valueS exist in other dataframe df2
我试过了:
res = df1[~(getattr(df1, 'A').isin(getattr(df2, 'A')) & getattr(df1, 'C').isin(getattr(df2, 'C')))]
有效 但是 在此示例中,列列表是可变的 columns = ['A', 'C'] 我如何遍历它来上面的表达式根据列表 'columns'
的值动态变化
exp: df1:
A B C D
0 oo one 0 0
1 bar one1 1 2
2 foo two2 2 4
3 bar one1 3 6
4 foo two 4 8
5 bar two 5 10
6 foo one 6 12
7 fowwo three 7 14
df2:
A B C D
0 oo one 0 0
2 foo two2 2 4
3 bar one1 3 6
4 foo two 4 8
5 bar two 5 10
6 foo one 6 12
7 fowwo three 7 14
结果:
A B C D
1 bar one1 1 2
使用:
column_list = ["A","C"]
df1[(~pd.concat((getattr(df1, col).isin(getattr(df2, col)) for col in column_list), axis=1 )).any(1)]
输出:
A B C D
1 bar one1 1 2
编辑
你在评论中说明的新情况可以用merge
解决。
数据帧:
df3= pd.DataFrame({'A': '1010994595 1017165396 1020896102 1028915753 1028915753 1030811227 1033837508 1047224448 1047559040 1053827106 1094815936 1113339076 1115345471 1121416375 1122392586 1122981502 1132224809 '.split(), 'B': '99203 99232 99233 99231 99291 99291 99232 99232 99242 99232 99244 G0425 99213 99203 99606 99243 99214'.split(), 'C': np.arange(17), 'D': np.arange(17) * 2})
df4= pd.DataFrame({'A': '1115345471 1113339076 1020896102 1047224448 1053827106 1121416375 1122392586 1028915753 1132224809 1030811227 1094815936 1033837508 1047559040 1122981502 1028915753 1030811227 1017165396 '.split(), 'B': '99213 G0425 99291 99232 99291 99243 99606 99291 99214 99291 99244 99233 99242 99243 99291 99291 99232 '.split(), 'C': np.arange(17), 'D': np.arange(17) * 2})
df4 中不在 df3 中的 select 行的代码(对于 column_list 中的列):
list_col = ["A","B"]
df4[df4.merge(df3.drop_duplicates(), on=list_col, how='left', indicator=True)["_merge"] == "left_only"]
输出:
A B C D
2 1020896102 99291 2 4
4 1053827106 99291 4 8
5 1121416375 99243 5 10
11 1033837508 99233 11 22
如果要为新 table 重置索引,请在末尾添加 .reset_index(drop=True)
答案是:
columns = ['A', 'B']
common_data_between_df1_and_df2_relative_to_columns = df1.merge(df2, on=columns , right_index=True)
res = df1[~(df1.index.isin(common_data_between_df1_and_df2 .index))].dropna()
我试过了:
res = df1[~(getattr(df1, 'A').isin(getattr(df2, 'A')) & getattr(df1, 'C').isin(getattr(df2, 'C')))]
有效 但是 在此示例中,列列表是可变的 columns = ['A', 'C'] 我如何遍历它来上面的表达式根据列表 'columns'
的值动态变化exp: df1:
A B C D
0 oo one 0 0
1 bar one1 1 2
2 foo two2 2 4
3 bar one1 3 6
4 foo two 4 8
5 bar two 5 10
6 foo one 6 12
7 fowwo three 7 14
df2:
A B C D
0 oo one 0 0
2 foo two2 2 4
3 bar one1 3 6
4 foo two 4 8
5 bar two 5 10
6 foo one 6 12
7 fowwo three 7 14
结果:
A B C D
1 bar one1 1 2
使用:
column_list = ["A","C"]
df1[(~pd.concat((getattr(df1, col).isin(getattr(df2, col)) for col in column_list), axis=1 )).any(1)]
输出:
A B C D
1 bar one1 1 2
编辑
你在评论中说明的新情况可以用merge
解决。
数据帧:
df3= pd.DataFrame({'A': '1010994595 1017165396 1020896102 1028915753 1028915753 1030811227 1033837508 1047224448 1047559040 1053827106 1094815936 1113339076 1115345471 1121416375 1122392586 1122981502 1132224809 '.split(), 'B': '99203 99232 99233 99231 99291 99291 99232 99232 99242 99232 99244 G0425 99213 99203 99606 99243 99214'.split(), 'C': np.arange(17), 'D': np.arange(17) * 2})
df4= pd.DataFrame({'A': '1115345471 1113339076 1020896102 1047224448 1053827106 1121416375 1122392586 1028915753 1132224809 1030811227 1094815936 1033837508 1047559040 1122981502 1028915753 1030811227 1017165396 '.split(), 'B': '99213 G0425 99291 99232 99291 99243 99606 99291 99214 99291 99244 99233 99242 99243 99291 99291 99232 '.split(), 'C': np.arange(17), 'D': np.arange(17) * 2})
df4 中不在 df3 中的 select 行的代码(对于 column_list 中的列):
list_col = ["A","B"]
df4[df4.merge(df3.drop_duplicates(), on=list_col, how='left', indicator=True)["_merge"] == "left_only"]
输出:
A B C D
2 1020896102 99291 2 4
4 1053827106 99291 4 8
5 1121416375 99243 5 10
11 1033837508 99233 11 22
如果要为新 table 重置索引,请在末尾添加 .reset_index(drop=True)
答案是:
columns = ['A', 'B']
common_data_between_df1_and_df2_relative_to_columns = df1.merge(df2, on=columns , right_index=True)
res = df1[~(df1.index.isin(common_data_between_df1_and_df2 .index))].dropna()