根据列中最先出现的 string/category,保留该行并删除 pandas 中的其他行
Based on whichever string/category appears first in a column, retain that row and drop other rows in pandas
我有一个数据框
df = pd.DataFrame([["A","X",1], ["B","W",0.9], ["B","X",0.8],
["A","W",0.7], ["B","Z",8], ["B","Y",48],
["A","Y",98],["A","Z",56]], columns=["id","key","val"])
在id列groupby之后,在column key中,W和X中先出现的那一行保留掉另一行,类似的Y和Z中先出现的那一行保留掉另一行groupby/commonid.
预期输出:
df_out = pd.DataFrame([["A","X",1], ["B","W",0.9], ["B","Z",8],
["A","Y",98]], columns=["id","key","val"])
怎么做?
使用 DataFrame.replace
for same 'group'
s and pass to DataFrame.duplicated
获取两列的第一个重复项(因此每个 id
组):
df = df[~df.replace({'key':{'X':'W', 'Y':'Z'}}).duplicated(['id','key'])]
print (df)
id key val
0 A X 1.0
1 B W 0.9
4 B Z 8.0
6 A Y 98.0
我有一个数据框
df = pd.DataFrame([["A","X",1], ["B","W",0.9], ["B","X",0.8],
["A","W",0.7], ["B","Z",8], ["B","Y",48],
["A","Y",98],["A","Z",56]], columns=["id","key","val"])
在id列groupby之后,在column key中,W和X中先出现的那一行保留掉另一行,类似的Y和Z中先出现的那一行保留掉另一行groupby/commonid.
预期输出:
df_out = pd.DataFrame([["A","X",1], ["B","W",0.9], ["B","Z",8],
["A","Y",98]], columns=["id","key","val"])
怎么做?
使用 DataFrame.replace
for same 'group'
s and pass to DataFrame.duplicated
获取两列的第一个重复项(因此每个 id
组):
df = df[~df.replace({'key':{'X':'W', 'Y':'Z'}}).duplicated(['id','key'])]
print (df)
id key val
0 A X 1.0
1 B W 0.9
4 B Z 8.0
6 A Y 98.0