保留在其他列上重复的重复行，否则保留在其他列上具有最高值的行

Question

我有一个数据框

df = pd.DataFrame([["A",98,56,3],["C",18,45,8], ["B",79,54,36], ["A",98,56,2],["C",18,45,9],["B",79,54,36], ["A",98,56,1],["B",79,54,36],["C",18,45,7]], columns=["id","c1","c2","c3"])

需要检查 id,c1,c2 列的重复值，如果在 c3 中找到重复项，则检查这些行的值，如果它们不相同（重复），则保留 c3 中具有最高值的行，并且删除其他行。如果 c3 中的值相同，则不要删除这些行。

Output = pd.DataFrame([["A",98,56,3],["B",79,54,36], ["C",18,45,9],["B",79,54,36], ["A",84,65,6],["B",79,54,36]], columns=["id","c1","c2","c3"])

如何在pandas中完成？

Answer 1

让我们做transform

out = df[df.c3==df.groupby(['id','c1','c2'])['c3'].transform('max')]
  id  c1  c2  c3
0  A  98  56   3
2  B  79  54  36
4  C  18  45   9
5  B  79  54  36
7  B  79  54  36

保留在其他列上重复的重复行，否则保留在其他列上具有最高值的行

Retain the rows of duplicates it is duplicate on other column else retain the row which has highest value on other column

python

dataframe

python-2.7

python-3.x

pandas