Python Groupby 运行 Total/Cumsum 列基于另一列中的字符串
Python Groupby Running Total/Cumsum column based on string in another column
我想创建 2 运行 总计列,这些列仅根据 TYPE
是 ANNUAL
还是 MONTHLY
在每个 Deal
所以它将是 DF.groupby(['Deal','Booking Month'])
然后以某种方式应用求和函数,当 TYPE==ANNUAL
用于第一列并且 TYPE==MONTHLY
用于第二列时。
这就是我的分组 DF 的样子 + 两个所需的列。
Deal TYPE Month Amount Running Total(ANNUAL) Running Total(Monthly)
A ANNUAL April 1000 1000 0
A ANNUAL April 2000 3000 0
A MONTHLY June 1500 3000 1500
B MONTHLY April 11150 0 11150
B ANNUAL July 700 700 11150
B ANNUAL August 303.63 1003.63 11150
C ANNUAL April 25624.59 25624.59 0
D ANNUAL June 5000 5000 0
D ANNUAL July 5000 10000 0
D ANNUAL August 5000 15000 0
E ANNUAL April 10 10 0
E MONTHLY May 1000 10 1000
E ANNUAL May 500 510 1000
E MONTHLY June 500.00 510 1500
E ANNUAL June 600 1110 1500
E MONTHLY July 300 1110 1800
E MONTHLY July 8200 1110 10000
使用filters
和groupby
+ transform
:
mask = df.TYPE.eq('ANNUAL')
cols = ['Running Total(ANNUAL)','Running Total(MONTHLY)']
df.loc[mask,'Running Total(ANNUAL)'] = df.loc[mask,'Amount']
df.loc[~mask,'Running Total(MONTHLY)'] = df.loc[~mask,'Amount']
df[cols] = df[cols].fillna(0)
df[cols] = df.groupby(['Deal'])['Running Total(ANNUAL)','Running Total(MONTHLY)'].transform('cumsum')
print(df)
Deal TYPE Month Amount Running Total(ANNUAL) \
0 A ANNUAL April 1000.00 1000.00
1 A ANNUAL April 2000.00 3000.00
2 A MONTHLY June 1500.00 3000.00
3 B MONTHLY April 11150.00 0.00
4 B ANNUAL July 700.00 700.00
5 B ANNUAL August 303.63 1003.63
6 C ANNUAL April 25624.59 25624.59
7 D ANNUAL June 5000.00 5000.00
8 D ANNUAL July 5000.00 10000.00
9 D ANNUAL August 5000.00 15000.00
10 E ANNUAL April 10.00 10.00
11 E MONTHLY May 1000.00 10.00
12 E ANNUAL May 500.00 510.00
13 E MONTHLY June 500.00 510.00
14 E ANNUAL June 600.00 1110.00
15 E MONTHLY July 300.00 1110.00
16 E MONTHLY July 8200.00 1110.00
Running Total(MONTHLY)
0 0.0
1 0.0
2 1500.0
3 11150.0
4 11150.0
5 11150.0
6 0.0
7 0.0
8 0.0
9 0.0
10 0.0
11 1000.0
12 1000.0
13 1500.0
14 1500.0
15 1800.0
16 10000.0
您可以使用 .expanding.sum()
执行此操作,它将维护组的多索引,您可以取消堆叠以获得每种类型的单独列。使用另一个 groupby
相应地填充每个组中的缺失值。将其连接回来。
这样做的好处是它可以针对任意多种类型完成,而无需在任何地方显式定义它们。
import pandas as pd
df2 = (df.groupby(['Deal', 'TYPE'])
.Amount.expanding().sum()
.unstack(level=1)
.groupby(level=0)
.ffill().fillna(0)
.reset_index(level=0, drop=True)
.drop(columns='Deal'))
pd.concat([df, df2], axis=1)
输出
Deal TYPE Month Amount ANNUAL MONTHLY
0 A ANNUAL April 1000.00 1000.00 0.0
1 A ANNUAL April 2000.00 3000.00 0.0
2 A MONTHLY June 1500.00 3000.00 1500.0
3 B MONTHLY April 11150.00 0.00 11150.0
4 B ANNUAL July 700.00 700.00 11150.0
5 B ANNUAL August 303.63 1003.63 11150.0
6 C ANNUAL April 25624.59 25624.59 0.0
7 D ANNUAL June 5000.00 5000.00 0.0
8 D ANNUAL July 5000.00 10000.00 0.0
9 D ANNUAL August 5000.00 15000.00 0.0
10 E ANNUAL April 10.00 10.00 0.0
11 E MONTHLY May 1000.00 10.00 1000.0
12 E ANNUAL May 500.00 510.00 1000.0
13 E MONTHLY June 500.00 510.00 1500.0
14 E ANNUAL June 600.00 1110.00 1500.0
15 E MONTHLY July 300.00 1110.00 1800.0
16 E MONTHLY July 8200.00 1110.00 10000.0
我想创建 2 运行 总计列,这些列仅根据 TYPE
是 ANNUAL
还是 MONTHLY
在每个 Deal
所以它将是 DF.groupby(['Deal','Booking Month'])
然后以某种方式应用求和函数,当 TYPE==ANNUAL
用于第一列并且 TYPE==MONTHLY
用于第二列时。
这就是我的分组 DF 的样子 + 两个所需的列。
Deal TYPE Month Amount Running Total(ANNUAL) Running Total(Monthly)
A ANNUAL April 1000 1000 0
A ANNUAL April 2000 3000 0
A MONTHLY June 1500 3000 1500
B MONTHLY April 11150 0 11150
B ANNUAL July 700 700 11150
B ANNUAL August 303.63 1003.63 11150
C ANNUAL April 25624.59 25624.59 0
D ANNUAL June 5000 5000 0
D ANNUAL July 5000 10000 0
D ANNUAL August 5000 15000 0
E ANNUAL April 10 10 0
E MONTHLY May 1000 10 1000
E ANNUAL May 500 510 1000
E MONTHLY June 500.00 510 1500
E ANNUAL June 600 1110 1500
E MONTHLY July 300 1110 1800
E MONTHLY July 8200 1110 10000
使用filters
和groupby
+ transform
:
mask = df.TYPE.eq('ANNUAL')
cols = ['Running Total(ANNUAL)','Running Total(MONTHLY)']
df.loc[mask,'Running Total(ANNUAL)'] = df.loc[mask,'Amount']
df.loc[~mask,'Running Total(MONTHLY)'] = df.loc[~mask,'Amount']
df[cols] = df[cols].fillna(0)
df[cols] = df.groupby(['Deal'])['Running Total(ANNUAL)','Running Total(MONTHLY)'].transform('cumsum')
print(df)
Deal TYPE Month Amount Running Total(ANNUAL) \
0 A ANNUAL April 1000.00 1000.00
1 A ANNUAL April 2000.00 3000.00
2 A MONTHLY June 1500.00 3000.00
3 B MONTHLY April 11150.00 0.00
4 B ANNUAL July 700.00 700.00
5 B ANNUAL August 303.63 1003.63
6 C ANNUAL April 25624.59 25624.59
7 D ANNUAL June 5000.00 5000.00
8 D ANNUAL July 5000.00 10000.00
9 D ANNUAL August 5000.00 15000.00
10 E ANNUAL April 10.00 10.00
11 E MONTHLY May 1000.00 10.00
12 E ANNUAL May 500.00 510.00
13 E MONTHLY June 500.00 510.00
14 E ANNUAL June 600.00 1110.00
15 E MONTHLY July 300.00 1110.00
16 E MONTHLY July 8200.00 1110.00
Running Total(MONTHLY)
0 0.0
1 0.0
2 1500.0
3 11150.0
4 11150.0
5 11150.0
6 0.0
7 0.0
8 0.0
9 0.0
10 0.0
11 1000.0
12 1000.0
13 1500.0
14 1500.0
15 1800.0
16 10000.0
您可以使用 .expanding.sum()
执行此操作,它将维护组的多索引,您可以取消堆叠以获得每种类型的单独列。使用另一个 groupby
相应地填充每个组中的缺失值。将其连接回来。
这样做的好处是它可以针对任意多种类型完成,而无需在任何地方显式定义它们。
import pandas as pd
df2 = (df.groupby(['Deal', 'TYPE'])
.Amount.expanding().sum()
.unstack(level=1)
.groupby(level=0)
.ffill().fillna(0)
.reset_index(level=0, drop=True)
.drop(columns='Deal'))
pd.concat([df, df2], axis=1)
输出
Deal TYPE Month Amount ANNUAL MONTHLY
0 A ANNUAL April 1000.00 1000.00 0.0
1 A ANNUAL April 2000.00 3000.00 0.0
2 A MONTHLY June 1500.00 3000.00 1500.0
3 B MONTHLY April 11150.00 0.00 11150.0
4 B ANNUAL July 700.00 700.00 11150.0
5 B ANNUAL August 303.63 1003.63 11150.0
6 C ANNUAL April 25624.59 25624.59 0.0
7 D ANNUAL June 5000.00 5000.00 0.0
8 D ANNUAL July 5000.00 10000.00 0.0
9 D ANNUAL August 5000.00 15000.00 0.0
10 E ANNUAL April 10.00 10.00 0.0
11 E MONTHLY May 1000.00 10.00 1000.0
12 E ANNUAL May 500.00 510.00 1000.0
13 E MONTHLY June 500.00 510.00 1500.0
14 E ANNUAL June 600.00 1110.00 1500.0
15 E MONTHLY July 300.00 1110.00 1800.0
16 E MONTHLY July 8200.00 1110.00 10000.0