Pandas：如果在 groupby 之后基于其他列存在重复项，则根据特定列的权重保留特定行

Question

我有一个数据框df

df = pd.DataFrame([["A","X",98,56,61], ["B","E",79,54,36], ["A","Y",98,56,61],["B","F",79,54,36], ["A","Z",98,56,61], ["A","W",48,51,85],["B","G",44,57,86],["B","H",79,54,36]], columns=["id","class","c1","c2","c3"])

当我们对 id 进行 groupby 时，如果存在基于多个列的重复值（行），例如 c1,c2,c3 , 保留基于列 class.

上给出的权重的行

例如，当我们在 id A 上进行 groupby 时，c1,c2,c3 是 class X,Y,Z,其中X,Y,Z权重赋予X所以保留X并删除其他行，同样在 E,F,H 中赋予 F 权重，所以保留 F 并删除其他行。

预期输出：

output = pd.DataFrame([["A","X",98,56,61],["B","F",79,54,36],["A","W",48,51,85],["B","G",44,57,86]], columns=["id","class","c1","c2","c3"])

怎么做？

Answer 1

根据您的解释，您可以创建权重字典，然后创建 2 个条件，然后执行：

#add classes for weightage incase of duplicates
cls = ['X','F']
c = df.duplicated(['id','c1','c2','c3'],keep=False) 
out = df[(c&df['class'].isin(cls))|~c]

print(out)

  id class  c1  c2  c3
0  A     X  98  56  61
3  B     F  79  54  36
5  A     W  48  51  85
6  B     G  44  57  86

Pandas：如果在 groupby 之后基于其他列存在重复项，则根据特定列的权重保留特定行

Pandas: Retain the a particular row based on weightage given on a particular column, if duplicates are present based on other columns after groupby

python

dataframe

python-2.7

python-3.x

pandas