如何使用方法链接跨列使用 groupby 转换?
How to use groupby transform across columns using method chaining?
使用方法链,我希望在 col_1==0
.
时使用 col_2
的值创建一个新列
np.random.seed(1)
df = pd.DataFrame({'group':list('AAABBBCCDDDD'),
'col_1': [-1,0,1,-1,0,1,0,1,-1,0,1,2],
'col_2': np.random.randint(0, 10, 12)})
group col_1 col_2
0 A -1 5
1 A 0 8
2 A 1 9
3 B -1 5
4 B 0 0
5 B 1 0
6 C 0 1
7 C 1 7
8 D -1 6
9 D 0 9
10 D 1 2
11 D 2 4
期望的输出:
group col_1 col_2 new_col
0 A -1 5 8
1 A 0 8 8
2 A 1 9 8
3 B -1 5 0
4 B 0 0 0
5 B 1 0 0
6 C 0 1 1
7 C 1 7 1
8 D -1 6 9
9 D 0 9 9
10 D 1 2 9
11 D 2 4 9
我使用 groupby transform
的方法(我希望它能起作用,但显然转换只访问单个列):
df.assign(
new_col = lambda df_: df_.groupby('group').transform(lambda x: x.loc[x.col_1==0].col_2)
)
AttributeError: 'Series' object has no attribute 'col_1'
在写这个问题时想到了这个解决方案,但我想我还是 post:
df.assign(
new_col = lambda df_: df_.merge(df.groupby('group')
.apply(lambda x: x.loc[x.col_1==0].col_2)
.reset_index().rename(columns={'col_2':'new_col'}), on='group'
).new_col
)
有没有更好的方法?
首先使用Series.where
for replace all col_2
values if not match col_1 ==1
and then use GroupBy.first
不是NaN
值:
df = df.assign(
new_col = lambda df_: df_['col_2'].where(df_['col_1'] == 0)
.groupby(df_['group']).transform('first')
)
print (df)
group col_1 col_2 new_col
0 A -1 5 8.0
1 A 0 8 8.0
2 A 1 9 8.0
3 B -1 5 0.0
4 B 0 0 0.0
5 B 1 0 0.0
6 C 0 1 1.0
7 C 1 7 1.0
8 D -1 6 9.0
9 D 0 9 9.0
10 D 1 2 9.0
11 D 2 4 9.0
另一种解决方案 Series.map
filtered rows by condition with DataFrame.set_index
按 group
列索引:
df = df.assign(
new_col = lambda df_: df_['group'].map(df_.loc[df_['col_1'] == 0]
.set_index('group')['col_2'])
)
print (df)
group col_1 col_2 new_col
0 A -1 5 8
1 A 0 8 8
2 A 1 9 8
3 B -1 5 0
4 B 0 0 0
5 B 1 0 0
6 C 0 1 1
7 C 1 7 1
8 D -1 6 9
9 D 0 9 9
10 D 1 2 9
11 D 2 4 9
使用方法链,我希望在 col_1==0
.
col_2
的值创建一个新列
np.random.seed(1)
df = pd.DataFrame({'group':list('AAABBBCCDDDD'),
'col_1': [-1,0,1,-1,0,1,0,1,-1,0,1,2],
'col_2': np.random.randint(0, 10, 12)})
group col_1 col_2
0 A -1 5
1 A 0 8
2 A 1 9
3 B -1 5
4 B 0 0
5 B 1 0
6 C 0 1
7 C 1 7
8 D -1 6
9 D 0 9
10 D 1 2
11 D 2 4
期望的输出:
group col_1 col_2 new_col
0 A -1 5 8
1 A 0 8 8
2 A 1 9 8
3 B -1 5 0
4 B 0 0 0
5 B 1 0 0
6 C 0 1 1
7 C 1 7 1
8 D -1 6 9
9 D 0 9 9
10 D 1 2 9
11 D 2 4 9
我使用 groupby transform
的方法(我希望它能起作用,但显然转换只访问单个列):
df.assign(
new_col = lambda df_: df_.groupby('group').transform(lambda x: x.loc[x.col_1==0].col_2)
)
AttributeError: 'Series' object has no attribute 'col_1'
在写这个问题时想到了这个解决方案,但我想我还是 post:
df.assign(
new_col = lambda df_: df_.merge(df.groupby('group')
.apply(lambda x: x.loc[x.col_1==0].col_2)
.reset_index().rename(columns={'col_2':'new_col'}), on='group'
).new_col
)
有没有更好的方法?
首先使用Series.where
for replace all col_2
values if not match col_1 ==1
and then use GroupBy.first
不是NaN
值:
df = df.assign(
new_col = lambda df_: df_['col_2'].where(df_['col_1'] == 0)
.groupby(df_['group']).transform('first')
)
print (df)
group col_1 col_2 new_col
0 A -1 5 8.0
1 A 0 8 8.0
2 A 1 9 8.0
3 B -1 5 0.0
4 B 0 0 0.0
5 B 1 0 0.0
6 C 0 1 1.0
7 C 1 7 1.0
8 D -1 6 9.0
9 D 0 9 9.0
10 D 1 2 9.0
11 D 2 4 9.0
另一种解决方案 Series.map
filtered rows by condition with DataFrame.set_index
按 group
列索引:
df = df.assign(
new_col = lambda df_: df_['group'].map(df_.loc[df_['col_1'] == 0]
.set_index('group')['col_2'])
)
print (df)
group col_1 col_2 new_col
0 A -1 5 8
1 A 0 8 8
2 A 1 9 8
3 B -1 5 0
4 B 0 0 0
5 B 1 0 0
6 C 0 1 1
7 C 1 7 1
8 D -1 6 9
9 D 0 9 9
10 D 1 2 9
11 D 2 4 9