如何在多索引框架中同时操作元素和分组？

Question

我有以下数据框：

    df=pd.DataFrame(np.random.randint(1,3,27).reshape((9,3)),\
           index= [['KH','KH','KH','KH','KH','KH','KH','KH','KH'],\
                 ['AOK','AOK','AOK','DOK','DOK','DOK','ROK','ROK','ROK'],\
                 ['A','B','C','A','B','C','A','B','C']],\
           columns=['JE','TE','DE']\
           )
     df.index.names = ['Deck','Status','Urs']
     df
Out[116]: 
                 JE  TE  DE
Deck Status Urs            
KH   AOK    A     1   1   2
            B     1   2   2
            C     2   1   1
     DOK    A     2   2   1
            B     1   2   1
            C     1   2   2
     ROK    A     2   2   2
            B     1   1   2
            C     1   2   1

现在我想简单地向其追加一列 'JErel'。此列应包含 'JE' 中的值，但作为相对分数。该分数应与 'Status' 个索引组的总和相关。

我可以通过以下方式访问总和：

df.loc[('KH','AOK')]['JE'].sum()
Out[117]: 4

该列的结果应类似于：

1/df.loc[('KH','AOK')]['JE'].sum(),
1/df.loc[('KH','AOK')]['JE'].sum(),
2/df.loc[('KH','AOK')]['JE'].sum() and then, 
2/df.loc[('KH','DOK')]['JE'].sum(), ...

, 那就是我得到的程度。

如何像 apply(Lambda...) 那样动态添加列？

Answer 1

可以用groupby.transform计算与原始数据框长度和索引相同的列JE和，然后除以 JE列靠它：

df['JErel'] = df.JE.div(df.groupby(level=['Deck','Status']).JE.transform('sum'))
df
#                  JE  TE  DE     JErel
# Deck  Status  Urs             
#   KH     AOK  A   2   2   1   0.400000
#               B   2   2   1   0.400000
#               C   1   1   2   0.200000
#          DOK  A   1   1   2   0.250000
#               B   2   1   2   0.500000
#               C   1   1   1   0.250000
#          ROK  A   2   1   2   0.333333
#               B   2   1   2   0.333333
#               C   2   1   1   0.333333

如何在多索引框架中同时操作元素和分组？

How manipulate element- and groupwise at the same time in multiindex frame?

python

apply

multi-index

dataframe

pandas