保留在其他列上重复的重复行,否则保留在其他列上具有最高值的行

Retain the rows of duplicates it is duplicate on other column else retain the row which has highest value on other column

我有一个数据框

df = pd.DataFrame([["A",98,56,3],["C",18,45,8], ["B",79,54,36], ["A",98,56,2],["C",18,45,9],["B",79,54,36], ["A",98,56,1],["B",79,54,36],["C",18,45,7]], columns=["id","c1","c2","c3"])

需要检查 id,c1,c2 列的重复值,如果在 c3 中找到重复项,则检查这些行的值,如果它们不相同(重复),则保留 c3 中具有最高值的行,并且删除其他行。如果 c3 中的值相同,则不要删除这些行。

Output = pd.DataFrame([["A",98,56,3],["B",79,54,36], ["C",18,45,9],["B",79,54,36], ["A",84,65,6],["B",79,54,36]], columns=["id","c1","c2","c3"])

如何在pandas中完成?

让我们做transform

out = df[df.c3==df.groupby(['id','c1','c2'])['c3'].transform('max')]
  id  c1  c2  c3
0  A  98  56   3
2  B  79  54  36
4  C  18  45   9
5  B  79  54  36
7  B  79  54  36