获取 pandas 中的累计总和
Get cumulative sum in pandas
上下文
Datetime
Campaign_name
Status
Open_time
2022-03-15 00:00
Funny_campaign
Open
2022-03-15 01:00
Funny_campaign
Continue
2022-03-15 02:00
Funny_campaign
Continue
2022-03-15 03:00
Funny_campaign
Continue
2022-03-15 04:00
Funny_campaign
Close
2022-03-15 08:00
Funny_campaign
Open
2022-03-15 09:00
Funny_campaign
Continue
2022-03-15 10:00
Funny_campaign
Close
问题
我需要计算从打开到关闭的时间。
我现在的代码
我可以采用两种方法。获取每个 'Close' 中的打开时间或每个 'Open' 和 'Continue' 中的累积 open_time。这是我对最后一个的看法。
我现在的代码几乎没问题,它没有计算关闭和打开之间的时间,但它忘记了对最后一个时间差求和。
df["Datetime"] = pd.to_datetime(df["Datetime"])
df["time_diff"] = df["Datetime"].diff()
df["time_diff"] = df["time_diff"].astype("timedelta64[m]").fillna(0)
condition = df["Status"] == "Close"
df.loc[condition, "time_diff"] = 0
df["Cumulative time"] = df.groupby(["Campaign_name"])["time_diff"].cumsum()
df = df.drop("time_diff", 1)
IIUC,你可以在 opens 上创建新的组并使用:
df['Datetime'] = pd.to_datetime(df['Datetime'])
group = df['Status'].eq('Open').cumsum()
df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
# or, alternative syntax
# df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g.diff().cumsum())
输出:
Datetime Campaign_name Status Open_time
0 2022-03-15 00:00:00 Funny_campaign Open 0 days 00:00:00
1 2022-03-15 01:00:00 Funny_campaign Continue 0 days 01:00:00
2 2022-03-15 02:00:00 Funny_campaign Continue 0 days 02:00:00
3 2022-03-15 03:00:00 Funny_campaign Continue 0 days 03:00:00
4 2022-03-15 04:00:00 Funny_campaign Close 0 days 04:00:00
5 2022-03-15 08:00:00 Funny_campaign Open 0 days 00:00:00
6 2022-03-15 09:00:00 Funny_campaign Continue 0 days 01:00:00
7 2022-03-15 10:00:00 Funny_campaign Close 0 days 02:00:00
或仅分配给“关闭”:
df.loc[df['Status'].eq('Close'), 'Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
输出:
Datetime Campaign_name Status Open_time
0 2022-03-15 00:00:00 Funny_campaign Open NaN
1 2022-03-15 01:00:00 Funny_campaign Continue NaN
2 2022-03-15 02:00:00 Funny_campaign Continue NaN
3 2022-03-15 03:00:00 Funny_campaign Continue NaN
4 2022-03-15 04:00:00 Funny_campaign Close 0 days 04:00:00
5 2022-03-15 08:00:00 Funny_campaign Open NaN
6 2022-03-15 09:00:00 Funny_campaign Continue NaN
7 2022-03-15 10:00:00 Funny_campaign Close 0 days 02:00:00
每组的差异 close-open:
df.groupby(group)['Datetime'].agg(lambda g: g.iloc[-1]-g.iloc[0])
输出:
Status
1 0 days 04:00:00
2 0 days 02:00:00
Name: Datetime, dtype: timedelta64[ns]
上下文
Datetime | Campaign_name | Status | Open_time |
---|---|---|---|
2022-03-15 00:00 | Funny_campaign | Open | |
2022-03-15 01:00 | Funny_campaign | Continue | |
2022-03-15 02:00 | Funny_campaign | Continue | |
2022-03-15 03:00 | Funny_campaign | Continue | |
2022-03-15 04:00 | Funny_campaign | Close | |
2022-03-15 08:00 | Funny_campaign | Open | |
2022-03-15 09:00 | Funny_campaign | Continue | |
2022-03-15 10:00 | Funny_campaign | Close |
问题
我需要计算从打开到关闭的时间。
我现在的代码
我可以采用两种方法。获取每个 'Close' 中的打开时间或每个 'Open' 和 'Continue' 中的累积 open_time。这是我对最后一个的看法。
我现在的代码几乎没问题,它没有计算关闭和打开之间的时间,但它忘记了对最后一个时间差求和。
df["Datetime"] = pd.to_datetime(df["Datetime"])
df["time_diff"] = df["Datetime"].diff()
df["time_diff"] = df["time_diff"].astype("timedelta64[m]").fillna(0)
condition = df["Status"] == "Close"
df.loc[condition, "time_diff"] = 0
df["Cumulative time"] = df.groupby(["Campaign_name"])["time_diff"].cumsum()
df = df.drop("time_diff", 1)
IIUC,你可以在 opens 上创建新的组并使用:
df['Datetime'] = pd.to_datetime(df['Datetime'])
group = df['Status'].eq('Open').cumsum()
df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
# or, alternative syntax
# df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g.diff().cumsum())
输出:
Datetime Campaign_name Status Open_time
0 2022-03-15 00:00:00 Funny_campaign Open 0 days 00:00:00
1 2022-03-15 01:00:00 Funny_campaign Continue 0 days 01:00:00
2 2022-03-15 02:00:00 Funny_campaign Continue 0 days 02:00:00
3 2022-03-15 03:00:00 Funny_campaign Continue 0 days 03:00:00
4 2022-03-15 04:00:00 Funny_campaign Close 0 days 04:00:00
5 2022-03-15 08:00:00 Funny_campaign Open 0 days 00:00:00
6 2022-03-15 09:00:00 Funny_campaign Continue 0 days 01:00:00
7 2022-03-15 10:00:00 Funny_campaign Close 0 days 02:00:00
或仅分配给“关闭”:
df.loc[df['Status'].eq('Close'), 'Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
输出:
Datetime Campaign_name Status Open_time
0 2022-03-15 00:00:00 Funny_campaign Open NaN
1 2022-03-15 01:00:00 Funny_campaign Continue NaN
2 2022-03-15 02:00:00 Funny_campaign Continue NaN
3 2022-03-15 03:00:00 Funny_campaign Continue NaN
4 2022-03-15 04:00:00 Funny_campaign Close 0 days 04:00:00
5 2022-03-15 08:00:00 Funny_campaign Open NaN
6 2022-03-15 09:00:00 Funny_campaign Continue NaN
7 2022-03-15 10:00:00 Funny_campaign Close 0 days 02:00:00
每组的差异 close-open:
df.groupby(group)['Datetime'].agg(lambda g: g.iloc[-1]-g.iloc[0])
输出:
Status
1 0 days 04:00:00
2 0 days 02:00:00
Name: Datetime, dtype: timedelta64[ns]