如何使用方法链接跨列使用 groupby 转换？

Question

使用方法链，我希望在 col_1==0.

时使用 col_2 的值创建一个新列

np.random.seed(1)

df = pd.DataFrame({'group':list('AAABBBCCDDDD'),
              'col_1': [-1,0,1,-1,0,1,0,1,-1,0,1,2],
              'col_2': np.random.randint(0, 10, 12)})

    group   col_1   col_2
0     A       -1      5
1     A        0      8
2     A        1      9
3     B       -1      5
4     B        0      0
5     B        1      0
6     C        0      1
7     C        1      7
8     D       -1      6
9     D        0      9
10    D        1      2
11    D        2      4

期望的输出：

    group   col_1   col_2   new_col
0     A       -1       5      8
1     A        0       8      8
2     A        1       9      8
3     B       -1       5      0
4     B        0       0      0
5     B        1       0      0
6     C        0       1      1
7     C        1       7      1
8     D       -1       6      9
9     D        0       9      9
10    D        1       2      9
11    D        2       4      9

我使用 groupby transform 的方法（我希望它能起作用，但显然转换只访问单个列）：

df.assign(
    new_col = lambda df_: df_.groupby('group').transform(lambda x: x.loc[x.col_1==0].col_2)
)
AttributeError: 'Series' object has no attribute 'col_1'

在写这个问题时想到了这个解决方案，但我想我还是 post:

df.assign(
    new_col = lambda df_: df_.merge(df.groupby('group')
     .apply(lambda x: x.loc[x.col_1==0].col_2)
     .reset_index().rename(columns={'col_2':'new_col'}), on='group'
    ).new_col
)

有没有更好的方法？

Answer 1

首先使用Series.where for replace all col_2 values if not match col_1 ==1 and then use GroupBy.first不是NaN值：

df = df.assign(
    new_col = lambda df_: df_['col_2'].where(df_['col_1'] == 0)
                                      .groupby(df_['group']).transform('first')
)
print (df)
   group  col_1  col_2  new_col
0      A     -1      5      8.0
1      A      0      8      8.0
2      A      1      9      8.0
3      B     -1      5      0.0
4      B      0      0      0.0
5      B      1      0      0.0
6      C      0      1      1.0
7      C      1      7      1.0
8      D     -1      6      9.0
9      D      0      9      9.0
10     D      1      2      9.0
11     D      2      4      9.0

另一种解决方案 Series.map filtered rows by condition with DataFrame.set_index 按 group 列索引：

df = df.assign(
    new_col = lambda df_: df_['group'].map(df_.loc[df_['col_1'] == 0]
                                              .set_index('group')['col_2'])
)
print (df)
   group  col_1  col_2  new_col
0      A     -1      5        8
1      A      0      8        8
2      A      1      9        8
3      B     -1      5        0
4      B      0      0        0
5      B      1      0        0
6      C      0      1        1
7      C      1      7        1
8      D     -1      6        9
9      D      0      9        9
10     D      1      2        9
11     D      2      4        9

如何使用方法链接跨列使用 groupby 转换？

How to use groupby transform across columns using method chaining?

transform

method-chaining

pandas

pandas-groupby