select 数据框中的行不在具有不同形状数组且对象不可迭代的不同数据框中

select rows in dataframe not in different dataframe with different shape arrays and where object is not iterable

我正在尝试 select 数据帧中不与临时数据帧共享 uniqueID 的行,但是当我使用以下命令时 NotProcessed= FileListDF.loc[FileListDF['DocID'] != InterimDF.DocID.tolist()] 我明白

ValueError: Arrays were different lengths: 255 vs 246

当我期望一个包含 9 行的数据框在 FileListDF 中但不在 InterimDF 中时

这个 似乎遇到了类似的问题,但答案不明确,当我尝试时,我得到

'UUID' object is not iterable

我不确定它是否连接到我使用 uuid.uuid4() 在第一个实例中创建 DocID

示例数据

FileListDF..to_json

<bound method NDFrame.to_json of     Class                                 DocID
0      CC  c2571c68-0705-4c87-a42b-0971a73fedf4
1      CC  ab317908-8720-47ce-a8f6-453169259e48
2      CC  2b3d1aee-b4f6-4278-9db2-c7b6819bd14f
3      CC  9d4b9084-f623-445c-86d9-0458d6fb4135
4      CC  afe9dbb5-38a2-4c4e-9640-b1da1d37b66f
5      CC  56fa7edb-5020-44e7-98bc-5b0878fbf9ac
6      CC  7c27bcb8-b381-465e-a6e9-edc38cf24e5a
7      CC  a56e6f13-bf05-474a-9930-5f0ebad33eba
8      CC  81c31d76-cec3-4f1c-8ec1-107ecc1169d4
9      CC  1f8acf61-0a08-4b42-8ad1-38eeebc85ff2
10     CC  d6a6704c-21c6-4dc5-b9bd-de9e67ee7970
11     CC  d63dde24-9a2b-410d-aa7d-6a854b646a12
12     CC  a80367f9-36af-45bd-878e-d19461a3d3df
13     CC  31f74aff-e8df-43bb-a7d4-314caea6de20
14     CC  99052ee4-2481-4c81-92dd-81b446e76c15
15     CC  3827d0b4-5d0a-490a-a561-bf631bae6492
16     CC  fdbde364-776d-4532-8aca-874144a85df8
17     CC  4dfa630f-20b0-4a9e-bcf0-cdd4f18bb215
18     CC  f5b16b14-a058-4f2d-8461-25e68b02c68b
19     CC  a9f23eee-5b44-4d4e-baf8-19b78b4e3508
20     CC  344bfb49-3b22-4e2a-b815-a9eb46cbed1b
21     CC  ade5b2c3-5962-4d6d-9252-4f670b127682
22     CC  43fa1fb2-3681-4372-ab5a-f12059b3741c
23     CC  77f19056-fef8-4353-9740-01afa5557f47
24     CC  1c7cfc8a-57c6-4ad3-bce7-63b006352ed1
25     CC  e6db5e3c-4b3e-4b27-9558-e7334032c9c0
26     CC  68adb3b6-dfeb-4645-8ed2-610ae896e8c4
27     CC  c0163ee2-9e34-4636-a734-051151c50f29
28     CC  7b7d8549-8697-4b19-a033-e0068f5aa4d7
29     CC  7a976925-57c5-4972-a9e6-278dd322f6b4
..    ...                                   ...
225    JA  cd3a901e-a077-472e-a100-f082f011cc72
226    JA  d462729a-e1bc-4502-abc1-8dbd2b21103d
227    JA  f5338d8d-13dc-4a03-b720-1056419deb68
228    JA  37a41e22-3698-459d-b3cd-0d419a0f3f48
229    JA  e48c4463-565d-4f09-9ccc-03e0506ddf1e
230    JA  8531a863-4b0b-4bfa-9c4a-05056e7c1165
231    JA  01e72b43-730d-42a7-9405-6fd45b2141b5
232    JA  5ebbf1d4-48f1-4d1f-a5bc-1e865c92146f
233    JA  c3103348-2209-42de-8005-06a9af3ca43f
234    JA  b603b2cd-c4c9-4a2e-a52f-361a1e06b277
235    JA  32bf7eba-7425-4f9c-ad1f-95680c8ad7cc
236    JA  514f667e-5ba8-40fa-8c6f-bbf06bb6ff2f
237    JA  d08b06d1-b8fc-432b-94af-a25d9ef8b4fe
238    JA  d32419e8-e327-470b-95ba-1a2e20cff6d7
239    NA  abdd46ff-3b81-4b5e-85a1-95db783ff2fa
240    NA  3587088d-a592-4f21-93fe-760e9276fbd9
241    NA  8eb7ae93-ab3a-4e3c-ad42-129949dd0545
242    NA  8ff017c0-aae2-43e8-b87b-2889b9bb0df8
243    NA  1ad7c9d6-8da4-46e8-97ee-c0a6c3299d27
244    NA  fa5e928f-a906-480f-824c-4615a155cc2a
245    NA  1d1b709d-3472-4bd1-961b-453368b6845a
246    NA  e302d87f-30d7-46aa-9358-5705a31cf31b
247    NA  ad1043ba-5807-482b-a4a0-624bbfbd36af
248    NA  94c4453f-1309-4931-9a8f-5400a1b104f5
249    NA  37aca929-a2fd-4055-a416-865b92ebe58c
250    NA  ad64522f-054a-4fcc-b1c2-fbf70eb06a62
251    NA  c30a75f1-c144-4d62-a3bb-9f41615b9ca0
252    NA  559c0a5e-7e76-4e4f-a560-1c4d52b31384
253    NA  5a123bb7-9ffb-46b2-b315-6e2f0aca7aed
254    NA  595da36e-ec30-40e4-abc1-b98400ae9303

这是 DocID 的数据框,我想从上面的 FileListDF 中删除匹配值,以找到与 FileListDF 中但不在 InterimDF 中相同的行数

InterimDF.to_json

<bound method NDFrame.to_json of                                     DocID
0    c2571c68-0705-4c87-a42b-0971a73fedf4
1    ab317908-8720-47ce-a8f6-453169259e48
2    2b3d1aee-b4f6-4278-9db2-c7b6819bd14f
3    9d4b9084-f623-445c-86d9-0458d6fb4135
4    56fa7edb-5020-44e7-98bc-5b0878fbf9ac
5    7c27bcb8-b381-465e-a6e9-edc38cf24e5a
6    a56e6f13-bf05-474a-9930-5f0ebad33eba
7    81c31d76-cec3-4f1c-8ec1-107ecc1169d4
8    1f8acf61-0a08-4b42-8ad1-38eeebc85ff2
9    d6a6704c-21c6-4dc5-b9bd-de9e67ee7970
10   d63dde24-9a2b-410d-aa7d-6a854b646a12
11   a80367f9-36af-45bd-878e-d19461a3d3df
12   31f74aff-e8df-43bb-a7d4-314caea6de20
13   99052ee4-2481-4c81-92dd-81b446e76c15
14   3827d0b4-5d0a-490a-a561-bf631bae6492
15   fdbde364-776d-4532-8aca-874144a85df8
16   4dfa630f-20b0-4a9e-bcf0-cdd4f18bb215
17   f5b16b14-a058-4f2d-8461-25e68b02c68b
18   a9f23eee-5b44-4d4e-baf8-19b78b4e3508
19   344bfb49-3b22-4e2a-b815-a9eb46cbed1b
20   ade5b2c3-5962-4d6d-9252-4f670b127682
21   43fa1fb2-3681-4372-ab5a-f12059b3741c
22   77f19056-fef8-4353-9740-01afa5557f47
23   1c7cfc8a-57c6-4ad3-bce7-63b006352ed1
24   e6db5e3c-4b3e-4b27-9558-e7334032c9c0
25   68adb3b6-dfeb-4645-8ed2-610ae896e8c4
26   c0163ee2-9e34-4636-a734-051151c50f29
27   7b7d8549-8697-4b19-a033-e0068f5aa4d7
28   7a976925-57c5-4972-a9e6-278dd322f6b4
29   6533a84b-a81d-411a-b44e-3090cd01396d
..                                    ...
216  f7217b94-b939-4bcf-b8c8-d385e4473c63
217  74522cf3-7fd5-4b84-b175-23f84ede25ae
218  cd3a901e-a077-472e-a100-f082f011cc72
219  d462729a-e1bc-4502-abc1-8dbd2b21103d
220  f5338d8d-13dc-4a03-b720-1056419deb68
221  37a41e22-3698-459d-b3cd-0d419a0f3f48
222  e48c4463-565d-4f09-9ccc-03e0506ddf1e
223  8531a863-4b0b-4bfa-9c4a-05056e7c1165
224  01e72b43-730d-42a7-9405-6fd45b2141b5
225  5ebbf1d4-48f1-4d1f-a5bc-1e865c92146f
226  c3103348-2209-42de-8005-06a9af3ca43f
227  b603b2cd-c4c9-4a2e-a52f-361a1e06b277
228  d08b06d1-b8fc-432b-94af-a25d9ef8b4fe
229  d32419e8-e327-470b-95ba-1a2e20cff6d7
230  abdd46ff-3b81-4b5e-85a1-95db783ff2fa
231  3587088d-a592-4f21-93fe-760e9276fbd9
232  8eb7ae93-ab3a-4e3c-ad42-129949dd0545
233  8ff017c0-aae2-43e8-b87b-2889b9bb0df8
234  1ad7c9d6-8da4-46e8-97ee-c0a6c3299d27
235  fa5e928f-a906-480f-824c-4615a155cc2a
236  1d1b709d-3472-4bd1-961b-453368b6845a
237  e302d87f-30d7-46aa-9358-5705a31cf31b
238  ad1043ba-5807-482b-a4a0-624bbfbd36af
239  94c4453f-1309-4931-9a8f-5400a1b104f5
240  37aca929-a2fd-4055-a416-865b92ebe58c
241  ad64522f-054a-4fcc-b1c2-fbf70eb06a62
242  c30a75f1-c144-4d62-a3bb-9f41615b9ca0
243  559c0a5e-7e76-4e4f-a560-1c4d52b31384
244  5a123bb7-9ffb-46b2-b315-6e2f0aca7aed
245  595da36e-ec30-40e4-abc1-b98400ae9303

所以如果我理解正确的话,你有两个数据框,每个数据框都有一列,两者之间有一些重叠。

为简单起见,我将它们称为 df_onedf_two,它们都有一个 id 列。

您可以使用 isin() 过滤掉重叠部分,或者反过来过滤掉重叠部分。

例如

import pandas as pd

df_one = pd.DataFrame(data={'id':[1,2,3,4,5]})
df_two = pd.DataFrame(data={'id':[3,4,5,6,7]})

new_ids = ~df_one['id'].isin(df_two['id'])

df_only_in_one = df_one[new_ids]