对于唯一名称,获取第一个值并从其余值中减去 pandas dataframe
For unique names, get the first value and subract from the rest value pandas dataframe
我有一个带有名称列的 pandas 数据框,对于每个名称,我想计算第一个值与其他每个值之间的差异。
数据帧
Name First Run Second Run Third Run Time
0 Jakub 315 279 276 9:00
1 Jakub 322 260 270 9:00
2 Jakub 248 260 213 15:00
3 Jakub 246 288 294 15:00
4 Jakub 276 274 263 15:00
5 Bene 222 268 246 9:00
6 Bene 236 302 263 9:00
7 Bene 208 245 224 15:00
8 Bene 283 245 245 15:00
9 Bene 278 257 249 15:00
first_values = df.groupby(['Name']).first()[['First Run','Second Run','Third Run']]
First Run Second Run Third Run Time
Name
Jakub 315 279 276 9:00
Bene 222 268 246 9:00
Output_df
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -9 3 15:00
我可以找出正确的函数来获得所需的输出。我尝试了以下代码,但出现了 TypeError。
df.Name.map(first_values.index).sub(first_values)
您可以使用 groupby
+apply
并减去每组的第一行:
df.filter(like=' Run').groupby(df['Name']).apply(lambda g: g-g.iloc[0])
或来自手动列列表:
cols = ['First Run','Second Run','Third Run']
df[cols].groupby(df['Name']).apply(lambda g: g-g.iloc[0])
拥有新数据框的完整代码:
df_output = df.copy()
df_output.update(df.filter(like=' Run')
.groupby(df['Name'])
.apply(lambda g: g-g.iloc[0])
)
输出:
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 -22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -11 3 15:00
您可以用 cumsum
检查 diff
df.update(df.groupby('Name')[['First Run','Second Run','Third Run']].apply(lambda x : x.diff().cumsum()).fillna(0))
Out[30]:
FirstRun SecondRun ThirdRun
0 0.0 0.0 0.0
1 7.0 -19.0 -6.0
2 -67.0 -19.0 -63.0
3 -69.0 9.0 18.0
4 -39.0 -5.0 -13.0
5 0.0 0.0 0.0
6 14.0 34.0 17.0
7 -14.0 -23.0 -22.0
8 61.0 -23.0 -1.0
9 56.0 -11.0 3.0
使用 GroupBy.transform
重复第一个值,因此减法效果很好:
cols = ['First Run','Second Run','Third Run']
df[cols] = df[cols].sub(df.groupby('Name')[cols].transform('first'))
print (df)
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 -22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -11 3 15:00
我有一个带有名称列的 pandas 数据框,对于每个名称,我想计算第一个值与其他每个值之间的差异。
数据帧
Name First Run Second Run Third Run Time
0 Jakub 315 279 276 9:00
1 Jakub 322 260 270 9:00
2 Jakub 248 260 213 15:00
3 Jakub 246 288 294 15:00
4 Jakub 276 274 263 15:00
5 Bene 222 268 246 9:00
6 Bene 236 302 263 9:00
7 Bene 208 245 224 15:00
8 Bene 283 245 245 15:00
9 Bene 278 257 249 15:00
first_values = df.groupby(['Name']).first()[['First Run','Second Run','Third Run']]
First Run Second Run Third Run Time
Name
Jakub 315 279 276 9:00
Bene 222 268 246 9:00
Output_df
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -9 3 15:00
我可以找出正确的函数来获得所需的输出。我尝试了以下代码,但出现了 TypeError。
df.Name.map(first_values.index).sub(first_values)
您可以使用 groupby
+apply
并减去每组的第一行:
df.filter(like=' Run').groupby(df['Name']).apply(lambda g: g-g.iloc[0])
或来自手动列列表:
cols = ['First Run','Second Run','Third Run']
df[cols].groupby(df['Name']).apply(lambda g: g-g.iloc[0])
拥有新数据框的完整代码:
df_output = df.copy()
df_output.update(df.filter(like=' Run')
.groupby(df['Name'])
.apply(lambda g: g-g.iloc[0])
)
输出:
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 -22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -11 3 15:00
您可以用 cumsum
diff
df.update(df.groupby('Name')[['First Run','Second Run','Third Run']].apply(lambda x : x.diff().cumsum()).fillna(0))
Out[30]:
FirstRun SecondRun ThirdRun
0 0.0 0.0 0.0
1 7.0 -19.0 -6.0
2 -67.0 -19.0 -63.0
3 -69.0 9.0 18.0
4 -39.0 -5.0 -13.0
5 0.0 0.0 0.0
6 14.0 34.0 17.0
7 -14.0 -23.0 -22.0
8 61.0 -23.0 -1.0
9 56.0 -11.0 3.0
使用 GroupBy.transform
重复第一个值,因此减法效果很好:
cols = ['First Run','Second Run','Third Run']
df[cols] = df[cols].sub(df.groupby('Name')[cols].transform('first'))
print (df)
Name First Run Second Run Third Run Time
0 Jakub 0 0 0 9:00
1 Jakub 7 -19 -6 9:00
2 Jakub -67 -19 -63 15:00
3 Jakub -69 9 18 15:00
4 Jakub -39 -5 -13 15:00
5 Bene 0 0 0 9:00
6 Bene 14 34 17 9:00
7 Bene -14 -23 -22 15:00
8 Bene 61 -23 -1 15:00
9 Bene 56 -11 3 15:00