保留在其他列上重复的重复行,否则保留在其他列上具有最高值的行
Retain the rows of duplicates it is duplicate on other column else retain the row which has highest value on other column
我有一个数据框
df = pd.DataFrame([["A",98,56,3],["C",18,45,8], ["B",79,54,36], ["A",98,56,2],["C",18,45,9],["B",79,54,36], ["A",98,56,1],["B",79,54,36],["C",18,45,7]], columns=["id","c1","c2","c3"])
需要检查 id,c1,c2 列的重复值,如果在 c3 中找到重复项,则检查这些行的值,如果它们不相同(重复),则保留 c3 中具有最高值的行,并且删除其他行。如果 c3 中的值相同,则不要删除这些行。
Output = pd.DataFrame([["A",98,56,3],["B",79,54,36], ["C",18,45,9],["B",79,54,36], ["A",84,65,6],["B",79,54,36]], columns=["id","c1","c2","c3"])
如何在pandas中完成?
让我们做transform
out = df[df.c3==df.groupby(['id','c1','c2'])['c3'].transform('max')]
id c1 c2 c3
0 A 98 56 3
2 B 79 54 36
4 C 18 45 9
5 B 79 54 36
7 B 79 54 36
我有一个数据框
df = pd.DataFrame([["A",98,56,3],["C",18,45,8], ["B",79,54,36], ["A",98,56,2],["C",18,45,9],["B",79,54,36], ["A",98,56,1],["B",79,54,36],["C",18,45,7]], columns=["id","c1","c2","c3"])
需要检查 id,c1,c2 列的重复值,如果在 c3 中找到重复项,则检查这些行的值,如果它们不相同(重复),则保留 c3 中具有最高值的行,并且删除其他行。如果 c3 中的值相同,则不要删除这些行。
Output = pd.DataFrame([["A",98,56,3],["B",79,54,36], ["C",18,45,9],["B",79,54,36], ["A",84,65,6],["B",79,54,36]], columns=["id","c1","c2","c3"])
如何在pandas中完成?
让我们做transform
out = df[df.c3==df.groupby(['id','c1','c2'])['c3'].transform('max')]
id c1 c2 c3
0 A 98 56 3
2 B 79 54 36
4 C 18 45 9
5 B 79 54 36
7 B 79 54 36