Python pandas 以错误的顺序删除重复项

Question

当运行在 python 上删除重复项时 pandas 似乎存在导致 DataFrame 按错误顺序排序的错误。

具体来说，我试图提供两列来执行删除重复项。而不是：

df.drop_duplicates(['a', 'b'], inplace = True)

我有：

df.drop_duplicates('a', 'b', inplace = True)

我认为这是导致问题的原因，因为它在添加方括号后消失了。

我不明白为什么这样：a) 不会因错误定义的输入而出错，b) 更改丢弃和保留内容的顺序。

Answer 1

docs for drop_duplicates 说的参数是：

subset : column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns

take_last : boolean, default False Take the last observed row in a row. Defaults to the first row

inplace : boolean, default False Whether to drop duplicates in place or to return a copy

cols : kwargs only argument of subset [deprecated]

因此，在您的调用中，它可能使用 b 作为 take_last，它被评估为布尔值 True。这是 Python 中的标准做法（检查错误输入并不全面）。

Python pandas 以错误的顺序删除重复项

Python pandas drops duplicates in wrong order

python

duplicates

pandas