Groupby 和 sum 并作为新行与之前的列一起插入
Groupby and sum and insert as a new row with previous columns
我正在重组数据。该进程包含两个任务:
- 正在通过对 groupby 求和来插入新行。
- 创建新列级别。
我已经在图片 1 和图片 2 中插入了预期的输出。
数据框包含多列。示例数据框如下:
df = pd.DataFrame()
df ['Competition']= ['EPL','EPL','EPL','EPL','EPL','EPL','EPL','EPL','EPL','EPL']
df ['Player']= ['Bruno','Bruno','Bruno','Bruno','Bruno','Bruno','Bruno','Bruno','Bruno','Bruno']
df ['template'] = ['Def','Def','Pass','Pass','Actions','Actions','Attk','Attk','Other','Other']
df ['Stats'] = ['Def duels', ' ','Back passes', ' ','Dribbles', ' ','Goal','Assist','Possession Losses','Possession Losses [own half]']
df ['Stat1'] = [' ', 'Def duels Won',' ', 'Back passes[Acc]',' ', 'Dribbles[Suc]',' ',' ',' ',' ']
df ['Value'] = [5,2.5,60,55,5,2,2,1,3,1]
我想使用 groupby 列 Competition、Player、Template 对值求和。该值将作为新行插入到现有行的正上方。预期数据框如下:
基于上述日期框架,我想创建一个新列Level,如下所示:
> The Level as defined as follow: level= 1 if blank or no data in the
> columns **Stats,Stats1** level= 2 if blank or no data in the columns
> **Stats1** level= 3 if data in the columns **Stats1**
我该怎么做?
这是np.select
,您可以根据需要修改:
# add extra rows with concat
df = pd.concat((df, df.groupby(['Competition','Player','template'])
.Value.sum().reset_index()
)
).fillna(' ')
df['Level'] = np.select((df['Stat1'].ne(' '), df['Stats'].ne(' ')),
(3, 2), 1)
输出:
Competition Player template Stats Stat1 Value Level
-- ------------- -------- ---------- ---------------------------- ---------------- ------- -------
0 EPL Bruno Def Def duels 5 2
1 EPL Bruno Def Def duels Won 2.5 3
2 EPL Bruno Pass Back passes 60 2
3 EPL Bruno Pass Back passes[Acc] 55 3
4 EPL Bruno Actions Dribbles 5 2
5 EPL Bruno Actions Dribbles[Suc] 2 3
6 EPL Bruno Attk Goal 2 2
7 EPL Bruno Attk Assist 1 2
8 EPL Bruno Other Possession Losses 3 2
9 EPL Bruno Other Possession Losses [own half] 1 2
0 EPL Bruno Actions 7 1
1 EPL Bruno Attk 3 1
2 EPL Bruno Def 7.5 1
3 EPL Bruno Other 4 1
4 EPL Bruno Pass 115 1
我正在重组数据。该进程包含两个任务:
- 正在通过对 groupby 求和来插入新行。
- 创建新列级别。
我已经在图片 1 和图片 2 中插入了预期的输出。
数据框包含多列。示例数据框如下:
df = pd.DataFrame()
df ['Competition']= ['EPL','EPL','EPL','EPL','EPL','EPL','EPL','EPL','EPL','EPL']
df ['Player']= ['Bruno','Bruno','Bruno','Bruno','Bruno','Bruno','Bruno','Bruno','Bruno','Bruno']
df ['template'] = ['Def','Def','Pass','Pass','Actions','Actions','Attk','Attk','Other','Other']
df ['Stats'] = ['Def duels', ' ','Back passes', ' ','Dribbles', ' ','Goal','Assist','Possession Losses','Possession Losses [own half]']
df ['Stat1'] = [' ', 'Def duels Won',' ', 'Back passes[Acc]',' ', 'Dribbles[Suc]',' ',' ',' ',' ']
df ['Value'] = [5,2.5,60,55,5,2,2,1,3,1]
我想使用 groupby 列 Competition、Player、Template 对值求和。该值将作为新行插入到现有行的正上方。预期数据框如下:
基于上述日期框架,我想创建一个新列Level,如下所示:
> The Level as defined as follow: level= 1 if blank or no data in the
> columns **Stats,Stats1** level= 2 if blank or no data in the columns
> **Stats1** level= 3 if data in the columns **Stats1**
我该怎么做?
这是np.select
,您可以根据需要修改:
# add extra rows with concat
df = pd.concat((df, df.groupby(['Competition','Player','template'])
.Value.sum().reset_index()
)
).fillna(' ')
df['Level'] = np.select((df['Stat1'].ne(' '), df['Stats'].ne(' ')),
(3, 2), 1)
输出:
Competition Player template Stats Stat1 Value Level
-- ------------- -------- ---------- ---------------------------- ---------------- ------- -------
0 EPL Bruno Def Def duels 5 2
1 EPL Bruno Def Def duels Won 2.5 3
2 EPL Bruno Pass Back passes 60 2
3 EPL Bruno Pass Back passes[Acc] 55 3
4 EPL Bruno Actions Dribbles 5 2
5 EPL Bruno Actions Dribbles[Suc] 2 3
6 EPL Bruno Attk Goal 2 2
7 EPL Bruno Attk Assist 1 2
8 EPL Bruno Other Possession Losses 3 2
9 EPL Bruno Other Possession Losses [own half] 1 2
0 EPL Bruno Actions 7 1
1 EPL Bruno Attk 3 1
2 EPL Bruno Def 7.5 1
3 EPL Bruno Other 4 1
4 EPL Bruno Pass 115 1