有没有更好的方法来查找重复行_including_first/last？

Question

考虑一个 Pandas 数据框：

import pandas as pd

df = pd.DataFrame({
    'a': pd.Series([1,1,1,2,3]),
    'b': pd.Series(list('asdfg'))
})

我想要 return 列 a 具有重复值的所有行，包括第一行或最后一行。我可以用

做到这一点

df[df['a'].duplicated() | df['a'].duplicated(take_last=True)]

有没有更好的方法？

Answer 1

对于重复的行，您可以 count 出现 a 和 return values>1。

In [25]: df[(df.groupby('a').transform('count')>1).values]
Out[25]:
   a  b
0  1  a
1  1  s
2  1  d

Is there a better way to find duplicate rows _including_ the first/last?