Dataframe:通过其他列组添加一个列

Dataframe: adding a column with mean by other column group

假设我有以下 DataFrame:

data = pd.DataFrame({'id' : ['1','2','3','4','5'], 'group' : ['1','1','2','1','2'], 
      'state' : ['True','False','False','True','True'], 'value' : [11,12,5,8,3]})

我想在之前的数据框中添加一个平均值为 'state' 的新列,即

pd.DataFrame({'id' : ['1','2','3','4','5'], 'group' : ['1','1','2','1','2'],
      'state' : ['True','False','False','True','True'], 'avg_state' : [0.66,0.66,0.5,0.66,0.5] ,value' : [11,12,5,8,3]})

IIUC 将 state 列改回布尔值,这样您就可以 sum,然后 groupbytransform:

df["avg_state"] = (df.assign(state=df["state"].map({"True":True, "False":False}))
                     .groupby("group")["state"]
                     .transform(lambda d: d.sum()/d.count()))
  
print (df)
  
  id group  state  value  avg_state
0  1     1   True     11   0.666667
1  2     1  False     12   0.666667
2  3     2  False      5   0.500000
3  4     1   True      8   0.666667
4  5     2   True      3   0.500000

另一种选择 pd.evaltransform mean

data['av_state'] = (data.assign(state=pd.eval(data['state']).astype(int))
                       .groupby("group")['state'].transform('mean'))

print(data)

  id group  state  value  av_state
0  1     1   True     11  0.666667
1  2     1  False     12  0.666667
2  3     2  False      5  0.500000
3  4     1   True      8  0.666667
4  5     2   True      3  0.500000