根据列中最先出现的 string/category，保留该行并删除 pandas 中的其他行

Question

我有一个数据框

df = pd.DataFrame([["A","X",1], ["B","W",0.9], ["B","X",0.8],
                   ["A","W",0.7], ["B","Z",8], ["B","Y",48],
                   ["A","Y",98],["A","Z",56]], columns=["id","key","val"])

在id列groupby之后，在column key中，W和X中先出现的那一行保留掉另一行，类似的Y和Z中先出现的那一行保留掉另一行groupby/commonid.

预期输出：

df_out = pd.DataFrame([["A","X",1], ["B","W",0.9], ["B","Z",8],
                       ["A","Y",98]], columns=["id","key","val"])

怎么做？

Answer 1

使用 DataFrame.replace for same 'group's and pass to DataFrame.duplicated 获取两列的第一个重复项（因此每个 id 组）：

df = df[~df.replace({'key':{'X':'W', 'Y':'Z'}}).duplicated(['id','key'])]
print (df)
  id key   val
0  A   X   1.0
1  B   W   0.9
4  B   Z   8.0
6  A   Y  98.0

根据列中最先出现的 string/category，保留该行并删除 pandas 中的其他行

Based on whichever string/category appears first in a column, retain that row and drop other rows in pandas

python

dataframe

python-2.7

python-3.x

pandas