计算特定时间出现的次数 window Python
Count occurrences in a specific time window Python
我的数据包含现场人员拜访客户的时间。我需要做的是计算每一天和每个客户的访问次数(在特定时间范围内 - 例如,从早上 8 点到晚上 8 点每 15 分钟一次。)理想情况下,绘制直方图的分布x 轴上的时间间隔和 y 轴上的出现次数。
这是我当前数据框的样子:
Client
Hour
Day
A
11:14:48
Monday
A
11:24:34
Monday
B
15:34:34
Tuesday
B
13:34:35
Tuesday
B
15:10:22
Tuesday
B
15:01:02
Tuesday
...
...
...
输出应该是这样的,我可以用来绘制直方图:
Interval
Client
Occurrences
Day
8:00:00 - 8:15:00
A
0
Monday
...
...
...
...
11:00:00 - 11:15:00
A
1
Monday
11:15:00 - 11:30:00
A
1
Monday
...
...
...
...
提前致谢!
无可否认,这很老套,但应该可以。如果有人有更好的解决方案,请告诉我。如果你有实际的 date-times 而不是时间间隔和日期名称之间的混合,这会更容易。
这是我使用的数据:
df = pd.DataFrame({'Client':['A', 'A', 'B', 'B', 'B', 'B'],
'Hour': ['11:14:48', '11:24:34', '15:34:34', '13:34:35', '15:10:22', '15:01:02'],
'Day':['Monday', 'Monday', 'Tuesday', 'Tuesday', 'Tuesday', 'Tuesday']})
这里是代码:
TIME_START = '08:00:00'
TIME_END = '20:00:00'
INTERVAL = '15min'
def reindex_by_date(df):
df['Hour'] = pd.to_datetime('1970-1-1 ' + df['Hour'].astype(str))
dt_index = pd.DatetimeIndex(pd.date_range(start=f'1970-1-1 {TIME_START}', end=f'1970-1-1 {TIME_END}', freq=INTERVAL))
resampled_df = df.resample('15min', on='Hour').count().reindex(dt_index).fillna(0).rename(columns={'Hour':'Occurrences'}).rename_axis('Hour').reset_index()
resampled_df['Client'] = df['Client'].iat[0]
resampled_df['Day'] = df['Day'].iat[0]
resampled_df['Hour'] = resampled_df['Hour'].dt.strftime('%H:%M:%S') + ' - ' + (resampled_df['Hour'] + pd.Timedelta(minutes=15)).dt.strftime('%H:%M:%S')
return resampled_df.rename(columns={'Hour':'Interval'})
result = df.groupby(['Client', 'Day'], as_index=False).apply(reindex_by_date).reset_index(0, drop=True)
result
看起来像这样:
Interval Client Occurrences Day
0 08:00:00 - 08:15:00 A 0.0 Monday
1 08:15:00 - 08:30:00 A 0.0 Monday
2 08:30:00 - 08:45:00 A 0.0 Monday
3 08:45:00 - 09:00:00 A 0.0 Monday
4 09:00:00 - 09:15:00 A 0.0 Monday
.. ... ... ... ...
44 19:00:00 - 19:15:00 B 0.0 Tuesday
45 19:15:00 - 19:30:00 B 0.0 Tuesday
46 19:30:00 - 19:45:00 B 0.0 Tuesday
47 19:45:00 - 20:00:00 B 0.0 Tuesday
48 20:00:00 - 20:15:00 B 0.0 Tuesday
[98 rows x 4 columns]
非零条目为:
Interval Client Occurrences Day
12 11:00:00 - 11:15:00 A 1.0 Monday
13 11:15:00 - 11:30:00 A 1.0 Monday
22 13:30:00 - 13:45:00 B 1.0 Tuesday
28 15:00:00 - 15:15:00 B 2.0 Tuesday
30 15:30:00 - 15:45:00 B 1.0 Tuesday
我的数据包含现场人员拜访客户的时间。我需要做的是计算每一天和每个客户的访问次数(在特定时间范围内 - 例如,从早上 8 点到晚上 8 点每 15 分钟一次。)理想情况下,绘制直方图的分布x 轴上的时间间隔和 y 轴上的出现次数。
这是我当前数据框的样子:
Client | Hour | Day |
---|---|---|
A | 11:14:48 | Monday |
A | 11:24:34 | Monday |
B | 15:34:34 | Tuesday |
B | 13:34:35 | Tuesday |
B | 15:10:22 | Tuesday |
B | 15:01:02 | Tuesday |
... | ... | ... |
输出应该是这样的,我可以用来绘制直方图:
Interval | Client | Occurrences | Day |
---|---|---|---|
8:00:00 - 8:15:00 | A | 0 | Monday |
... | ... | ... | ... |
11:00:00 - 11:15:00 | A | 1 | Monday |
11:15:00 - 11:30:00 | A | 1 | Monday |
... | ... | ... | ... |
提前致谢!
无可否认,这很老套,但应该可以。如果有人有更好的解决方案,请告诉我。如果你有实际的 date-times 而不是时间间隔和日期名称之间的混合,这会更容易。
这是我使用的数据:
df = pd.DataFrame({'Client':['A', 'A', 'B', 'B', 'B', 'B'],
'Hour': ['11:14:48', '11:24:34', '15:34:34', '13:34:35', '15:10:22', '15:01:02'],
'Day':['Monday', 'Monday', 'Tuesday', 'Tuesday', 'Tuesday', 'Tuesday']})
这里是代码:
TIME_START = '08:00:00'
TIME_END = '20:00:00'
INTERVAL = '15min'
def reindex_by_date(df):
df['Hour'] = pd.to_datetime('1970-1-1 ' + df['Hour'].astype(str))
dt_index = pd.DatetimeIndex(pd.date_range(start=f'1970-1-1 {TIME_START}', end=f'1970-1-1 {TIME_END}', freq=INTERVAL))
resampled_df = df.resample('15min', on='Hour').count().reindex(dt_index).fillna(0).rename(columns={'Hour':'Occurrences'}).rename_axis('Hour').reset_index()
resampled_df['Client'] = df['Client'].iat[0]
resampled_df['Day'] = df['Day'].iat[0]
resampled_df['Hour'] = resampled_df['Hour'].dt.strftime('%H:%M:%S') + ' - ' + (resampled_df['Hour'] + pd.Timedelta(minutes=15)).dt.strftime('%H:%M:%S')
return resampled_df.rename(columns={'Hour':'Interval'})
result = df.groupby(['Client', 'Day'], as_index=False).apply(reindex_by_date).reset_index(0, drop=True)
result
看起来像这样:
Interval Client Occurrences Day
0 08:00:00 - 08:15:00 A 0.0 Monday
1 08:15:00 - 08:30:00 A 0.0 Monday
2 08:30:00 - 08:45:00 A 0.0 Monday
3 08:45:00 - 09:00:00 A 0.0 Monday
4 09:00:00 - 09:15:00 A 0.0 Monday
.. ... ... ... ...
44 19:00:00 - 19:15:00 B 0.0 Tuesday
45 19:15:00 - 19:30:00 B 0.0 Tuesday
46 19:30:00 - 19:45:00 B 0.0 Tuesday
47 19:45:00 - 20:00:00 B 0.0 Tuesday
48 20:00:00 - 20:15:00 B 0.0 Tuesday
[98 rows x 4 columns]
非零条目为:
Interval Client Occurrences Day
12 11:00:00 - 11:15:00 A 1.0 Monday
13 11:15:00 - 11:30:00 A 1.0 Monday
22 13:30:00 - 13:45:00 B 1.0 Tuesday
28 15:00:00 - 15:15:00 B 2.0 Tuesday
30 15:30:00 - 15:45:00 B 1.0 Tuesday