尝试 cumsum() pandas 具有相同值的数据框出现在多列中
Trying to cumsum() pandas dataframe with same values appearing in multiple columns
我正在尝试使用 groupby 获取累计和,其中累计和应用于包含相同值的多个列
import pandas as pd
import numpy as np
df = pd.DataFrame([['Jazz', 'Clippers', 89, 100],
['Clippers' , 'Jazz', 101, 97],
['Bucks' , 'Jazz', 99, 112],
['Jazz' , 'Bucks', 109, 88]],
columns=['home_team', 'away_team', 'home_points', 'away_points'])
print(df)
这将生成一个输出为
的数据帧
home_team away_team home_points away_points
0 Jazz Clippers 89 100
1 Clippers Jazz 101 97
2 Bucks Jazz 99 112
3 Jazz Bucks 109 88
我想做的是获得主队和客队的累计总分,这将说明每支球队同时出现在主队和客队列中的事实,但我所能弄清楚的是按球队名称分组的累计总数,每个球队的主场或客场总和,如下所示
df["home_cumulative_points"]= df.groupby(["home_team"])["home_points"].cumsum()
df["away_cumulative_points"]= df.groupby(["away_team"])["away_points"].cumsum()
print(df)
产生
home_team away_team home_points away_points home_cumulative_points away_cumulative_points
0 Jazz Clippers 89 100 89 100
1 Clippers Jazz 101 97 101 97
2 Bucks Jazz 99 112 99 209
3 Jazz Bucks 109 88 198 88
有什么方法可以让我通过 groupby 计算主场和客场列中同一支球队的累计总和,从而使 运行 总和加上球队的积分,而不管他们是否在主场还是离开?所以最后一行的理想输出是
home_team away_team home_points away_points home_cumulative_points away_cumulative_points
3 Jazz Bucks 109 88 407 187
我猜我可能需要做一个 for 循环或其他什么,但我不确定如何最好地去做。提前感谢您的任何反馈!
想法是 select 唯一必要的列,按 _
拆分为 MultiIndex
,按 DataFrame.stack
重塑,因此可以对每个列一起使用 cumsum
:
cols = ['home_team', 'away_team', 'home_points', 'away_points']
df1 = df[cols].copy()
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(0).rename_axis(['lev1','lev2'])
df1["cumulative_points"]= df1.groupby(["team", 'lev1'])["points"].cumsum()
df2 = df1.unstack()
df2.columns = df2.columns.map(lambda x: f'{x[1]}_{x[0]}')
print(df2)
away_points home_points away_team home_team away_cumulative_points \
lev1
0 100 89 Clippers Jazz 100
1 97 101 Jazz Clippers 97
2 112 99 Jazz Bucks 112
3 88 109 Bucks Jazz 88
home_cumulative_points
lev1
0 89
1 101
2 99
3 109
或:
df["home_cumulative_points"]= df1.loc['home', 'cumulative_points']
df["away_cumulative_points"]= df1.loc['away', 'cumulative_points']
另一种方法是使用 concat
和 rename
进行整形:
f = lambda x: x.split('_')[1]
df1 = pd.concat([df[['home_team', 'home_points']].rename(columns=f),
df[['away_team', 'away_points']].rename(columns=f)], keys=('home','away'))
df1 = df1.rename_axis(['lev1','lev2'])
df1["cumulative_points"]= df1.groupby(["team", 'lev1'])["points"].cumsum()
df["home_cumulative_points"]= df1.loc['home', 'cumulative_points']
df["away_cumulative_points"]= df1.loc['away', 'cumulative_points']
print(df)
home_team away_team home_points away_points home_cumulative_points \
0 Jazz Clippers 89 100 89
1 Clippers Jazz 101 97 101
2 Bucks Jazz 99 112 99
3 Jazz Bucks 109 88 198
away_cumulative_points
0 100
1 97
2 209
3 88
我正在尝试使用 groupby 获取累计和,其中累计和应用于包含相同值的多个列
import pandas as pd
import numpy as np
df = pd.DataFrame([['Jazz', 'Clippers', 89, 100],
['Clippers' , 'Jazz', 101, 97],
['Bucks' , 'Jazz', 99, 112],
['Jazz' , 'Bucks', 109, 88]],
columns=['home_team', 'away_team', 'home_points', 'away_points'])
print(df)
这将生成一个输出为
的数据帧 home_team away_team home_points away_points
0 Jazz Clippers 89 100
1 Clippers Jazz 101 97
2 Bucks Jazz 99 112
3 Jazz Bucks 109 88
我想做的是获得主队和客队的累计总分,这将说明每支球队同时出现在主队和客队列中的事实,但我所能弄清楚的是按球队名称分组的累计总数,每个球队的主场或客场总和,如下所示
df["home_cumulative_points"]= df.groupby(["home_team"])["home_points"].cumsum()
df["away_cumulative_points"]= df.groupby(["away_team"])["away_points"].cumsum()
print(df)
产生
home_team away_team home_points away_points home_cumulative_points away_cumulative_points
0 Jazz Clippers 89 100 89 100
1 Clippers Jazz 101 97 101 97
2 Bucks Jazz 99 112 99 209
3 Jazz Bucks 109 88 198 88
有什么方法可以让我通过 groupby 计算主场和客场列中同一支球队的累计总和,从而使 运行 总和加上球队的积分,而不管他们是否在主场还是离开?所以最后一行的理想输出是
home_team away_team home_points away_points home_cumulative_points away_cumulative_points
3 Jazz Bucks 109 88 407 187
我猜我可能需要做一个 for 循环或其他什么,但我不确定如何最好地去做。提前感谢您的任何反馈!
想法是 select 唯一必要的列,按 _
拆分为 MultiIndex
,按 DataFrame.stack
重塑,因此可以对每个列一起使用 cumsum
:
cols = ['home_team', 'away_team', 'home_points', 'away_points']
df1 = df[cols].copy()
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(0).rename_axis(['lev1','lev2'])
df1["cumulative_points"]= df1.groupby(["team", 'lev1'])["points"].cumsum()
df2 = df1.unstack()
df2.columns = df2.columns.map(lambda x: f'{x[1]}_{x[0]}')
print(df2)
away_points home_points away_team home_team away_cumulative_points \
lev1
0 100 89 Clippers Jazz 100
1 97 101 Jazz Clippers 97
2 112 99 Jazz Bucks 112
3 88 109 Bucks Jazz 88
home_cumulative_points
lev1
0 89
1 101
2 99
3 109
或:
df["home_cumulative_points"]= df1.loc['home', 'cumulative_points']
df["away_cumulative_points"]= df1.loc['away', 'cumulative_points']
另一种方法是使用 concat
和 rename
进行整形:
f = lambda x: x.split('_')[1]
df1 = pd.concat([df[['home_team', 'home_points']].rename(columns=f),
df[['away_team', 'away_points']].rename(columns=f)], keys=('home','away'))
df1 = df1.rename_axis(['lev1','lev2'])
df1["cumulative_points"]= df1.groupby(["team", 'lev1'])["points"].cumsum()
df["home_cumulative_points"]= df1.loc['home', 'cumulative_points']
df["away_cumulative_points"]= df1.loc['away', 'cumulative_points']
print(df)
home_team away_team home_points away_points home_cumulative_points \
0 Jazz Clippers 89 100 89
1 Clippers Jazz 101 97 101
2 Bucks Jazz 99 112 99
3 Jazz Bucks 109 88 198
away_cumulative_points
0 100
1 97
2 209
3 88