通过从包含列表 pandas 的列中删除重复项来过滤数据框
Filter dataframe by removing duplicates from column containing list pandas
Dataframe 列包含列表中的字符串值。 Dataframe 需要转换为在列 'Final'
中包含具有唯一列表的行
我有如下数据框,
string1 string2 Final
1 [abc,ncx] [qwe, rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
3 [ncx,abc] [rty,qwe] [mango,apple]
4 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
6 [ncx,abc] [rty,qwe] [mango,apple]
df['final'] 列必须删除重复列表并转换数据框以包含 'final' 列中的唯一列表。
所需的输出数据帧:
string1 string2 Final
1 [abc,ncx] [qwe, rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
3 [ncx,abc] [rty,qwe] [mango,apple]
4 [uio,dfg] [zxc,dfv] [banana, apple]
由 Series.duplicated
, but because list
s are not hashable first convert them to tuples and filter in boolean indexing
创建的 ~
反转掩码:
df = df[~df['Final'].apply(tuple).duplicated()]
print (df)
string1 string2 Final
1 [abc,ncx] [qwe,rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana, grapes, apple]
3 [ncx,abc] [rty,qwe] [mango, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
如果 apple, mango
应该与 mango, apple
重复(顺序不重要)将 tuple
更改为 frozenset
:
df = df[~df['Final'].apply(frozenset).duplicated()]
print (df)
string1 string2 Final
1 [abc,ncx] [qwe,rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana, grapes, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
Dataframe 列包含列表中的字符串值。 Dataframe 需要转换为在列 'Final'
中包含具有唯一列表的行我有如下数据框,
string1 string2 Final
1 [abc,ncx] [qwe, rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
3 [ncx,abc] [rty,qwe] [mango,apple]
4 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
6 [ncx,abc] [rty,qwe] [mango,apple]
df['final'] 列必须删除重复列表并转换数据框以包含 'final' 列中的唯一列表。
所需的输出数据帧:
string1 string2 Final
1 [abc,ncx] [qwe, rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
3 [ncx,abc] [rty,qwe] [mango,apple]
4 [uio,dfg] [zxc,dfv] [banana, apple]
由 Series.duplicated
, but because list
s are not hashable first convert them to tuples and filter in boolean indexing
创建的 ~
反转掩码:
df = df[~df['Final'].apply(tuple).duplicated()]
print (df)
string1 string2 Final
1 [abc,ncx] [qwe,rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana, grapes, apple]
3 [ncx,abc] [rty,qwe] [mango, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
如果 apple, mango
应该与 mango, apple
重复(顺序不重要)将 tuple
更改为 frozenset
:
df = df[~df['Final'].apply(frozenset).duplicated()]
print (df)
string1 string2 Final
1 [abc,ncx] [qwe,rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana, grapes, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]