Pandas 延长时间序列的间隔
Pandas extends intervals in timeseries
我有以下数据框:
product timestamp count
0 apple 2021-06-29 11:00:00-04:00 1023
1 apple 2021-06-29 12:00:00-04:00 3790
2 apple 2021-06-29 13:00:00-04:00 128
3 apple 2021-06-29 14:00:00-04:00 0
4 apple 2021-06-29 15:00:00-04:00 323
5 apple 2021-06-29 16:00:00-04:00 4223
6 apple 2021-06-29 17:00:00-04:00 1194
4 orange 2021-06-29 15:00:00-04:00 23
5 orange 2021-06-29 16:00:00-04:00 4289
如您所见,两种产品(苹果和橙子)都有一个与小时相关的计数。
对于苹果,我们有时间间隔 2021-06-29 11:00:00-04:00 - 2021-06-29 17:00:00-04:00
的数据,而对于橙子,我们有时间间隔 2021-06-29 15:00:00-04:00 - 2021-06-29 16:00:00-04:00
.
的数据
我想延长时间间隔,以便对于这两种产品,我们都有从一天开始到结束的每小时数据,即在间隔 2021-06-29 00:00:00-04:00 - 2021-06-29 23:00:00-04:00
.
内
结果如下:
product timestamp count
0 apple 2021-06-29 00:00:00-04:00 0
1 apple 2021-06-29 01:00:00-04:00 0
2 apple 2021-06-29 02:00:00-04:00 0
3 apple 2021-06-29 03:00:00-04:00 0
4 apple 2021-06-29 04:00:00-04:00 0
5 apple 2021-06-29 05:00:00-04:00 0
6 apple 2021-06-29 06:00:00-04:00 0
7 apple 2021-06-29 07:00:00-04:00 0
8 apple 2021-06-29 08:00:00-04:00 0
9 apple 2021-06-29 09:00:00-04:00 0
10 apple 2021-06-29 10:00:00-04:00 0
11 apple 2021-06-29 11:00:00-04:00 1023
12 apple 2021-06-29 12:00:00-04:00 3790
13 apple 2021-06-29 13:00:00-04:00 128
14 apple 2021-06-29 14:00:00-04:00 0
15 apple 2021-06-29 15:00:00-04:00 323
16 apple 2021-06-29 16:00:00-04:00 4223
17 apple 2021-06-29 17:00:00-04:00 1194
18 apple 2021-06-29 18:00:00-04:00 0
19 apple 2021-06-29 19:00:00-04:00 0
20 apple 2021-06-29 20:00:00-04:00 0
21 apple 2021-06-29 21:00:00-04:00 0
22 apple 2021-06-29 22:00:00-04:00 0
23 apple 2021-06-29 23:00:00-04:00 0
24 orange 2021-06-29 00:00:00-04:00 0
25 orange 2021-06-29 01:00:00-04:00 0
26 orange 2021-06-29 02:00:00-04:00 0
27 orange 2021-06-29 03:00:00-04:00 0
28 orange 2021-06-29 04:00:00-04:00 0
29 orange 2021-06-29 05:00:00-04:00 0
30 orange 2021-06-29 06:00:00-04:00 0
31 orange 2021-06-29 07:00:00-04:00 0
32 orange 2021-06-29 08:00:00-04:00 0
33 orange 2021-06-29 09:00:00-04:00 0
34 orange 2021-06-29 10:00:00-04:00 0
35 orange 2021-06-29 11:00:00-04:00 0
36 orange 2021-06-29 12:00:00-04:00 0
37 orange 2021-06-29 13:00:00-04:00 0
38 orange 2021-06-29 14:00:00-04:00 0
39 orange 2021-06-29 15:00:00-04:00 23
40 orange 2021-06-29 16:00:00-04:00 4289
41 orange 2021-06-29 17:00:00-04:00 0
42 orange 2021-06-29 18:00:00-04:00 0
43 orange 2021-06-29 19:00:00-04:00 0
44 orange 2021-06-29 20:00:00-04:00 0
45 orange 2021-06-29 21:00:00-04:00 0
46 orange 2021-06-29 22:00:00-04:00 0
47 orange 2021-06-29 23:00:00-04:00 0
到目前为止,我试图重新索引 DataFrame,但我做错了。
谢谢!
让我们试试:
- 如果尚未转换
to_datetime
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
动态查找开始和结束日期时间(这也可以静态完成)并创建小时范围:
动态:
dates = df['timestamp'].dt.date
date_range = pd.date_range(start=f'{dates.min()} 00:00:00-04:00',
end=f'{dates.max()} 23:00:00-04:00',
freq='H')
静态:
date_range = pd.date_range(start='2021-06-29 00:00:00-04:00',
end=f'2021-06-29 23:00:00-04:00',
freq='H')
创建MultiIndex.from_product
based on the unique
产品值并建立date_range
:
midx = pd.MultiIndex.from_product(
[df['product'].unique(), date_range],
names=['product', 'timestamp']
)
set_index
+ reindex
+ reset_index
放大框架:
df = (
df.set_index(['product', 'timestamp'])
.reindex(midx, fill_value=0)
.reset_index()
)
df
:
product timestamp count
0 apple 2021-06-29 00:00:00-04:00 0
1 apple 2021-06-29 01:00:00-04:00 0
2 apple 2021-06-29 02:00:00-04:00 0
3 apple 2021-06-29 03:00:00-04:00 0
4 apple 2021-06-29 04:00:00-04:00 0
5 apple 2021-06-29 05:00:00-04:00 0
6 apple 2021-06-29 06:00:00-04:00 0
7 apple 2021-06-29 07:00:00-04:00 0
8 apple 2021-06-29 08:00:00-04:00 0
9 apple 2021-06-29 09:00:00-04:00 0
10 apple 2021-06-29 10:00:00-04:00 0
11 apple 2021-06-29 11:00:00-04:00 1023
12 apple 2021-06-29 12:00:00-04:00 3790
13 apple 2021-06-29 13:00:00-04:00 128
14 apple 2021-06-29 14:00:00-04:00 0
15 apple 2021-06-29 15:00:00-04:00 323
16 apple 2021-06-29 16:00:00-04:00 4223
17 apple 2021-06-29 17:00:00-04:00 1194
18 apple 2021-06-29 18:00:00-04:00 0
19 apple 2021-06-29 19:00:00-04:00 0
20 apple 2021-06-29 20:00:00-04:00 0
21 apple 2021-06-29 21:00:00-04:00 0
22 apple 2021-06-29 22:00:00-04:00 0
23 apple 2021-06-29 23:00:00-04:00 0
24 orange 2021-06-29 00:00:00-04:00 0
25 orange 2021-06-29 01:00:00-04:00 0
26 orange 2021-06-29 02:00:00-04:00 0
27 orange 2021-06-29 03:00:00-04:00 0
28 orange 2021-06-29 04:00:00-04:00 0
29 orange 2021-06-29 05:00:00-04:00 0
30 orange 2021-06-29 06:00:00-04:00 0
31 orange 2021-06-29 07:00:00-04:00 0
32 orange 2021-06-29 08:00:00-04:00 0
33 orange 2021-06-29 09:00:00-04:00 0
34 orange 2021-06-29 10:00:00-04:00 0
35 orange 2021-06-29 11:00:00-04:00 0
36 orange 2021-06-29 12:00:00-04:00 0
37 orange 2021-06-29 13:00:00-04:00 0
38 orange 2021-06-29 14:00:00-04:00 0
39 orange 2021-06-29 15:00:00-04:00 23
40 orange 2021-06-29 16:00:00-04:00 4289
41 orange 2021-06-29 17:00:00-04:00 0
42 orange 2021-06-29 18:00:00-04:00 0
43 orange 2021-06-29 19:00:00-04:00 0
44 orange 2021-06-29 20:00:00-04:00 0
45 orange 2021-06-29 21:00:00-04:00 0
46 orange 2021-06-29 22:00:00-04:00 0
47 orange 2021-06-29 23:00:00-04:00 0
完整的工作示例:
import pandas as pd
df = pd.DataFrame({
'product': ['apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple',
'orange', 'orange'],
'timestamp': ['2021-06-29 11:00:00-04:00', '2021-06-29 12:00:00-04:00',
'2021-06-29 13:00:00-04:00', '2021-06-29 14:00:00-04:00',
'2021-06-29 15:00:00-04:00', '2021-06-29 16:00:00-04:00',
'2021-06-29 17:00:00-04:00', '2021-06-29 15:00:00-04:00',
'2021-06-29 16:00:00-04:00'],
'count': [1023, 3790, 128, 0, 323, 4223, 1194, 23, 4289]
})
df['timestamp'] = pd.to_datetime(df['timestamp'])
dates = df['timestamp'].dt.date
date_range = pd.date_range(start=f'{dates.min()} 00:00:00-04:00',
end=f'{dates.max()} 23:00:00-04:00',
freq='H')
midx = pd.MultiIndex.from_product(
[df['product'].unique(), date_range],
names=['product', 'timestamp']
)
df = (
df.set_index(['product', 'timestamp'])
.reindex(midx, fill_value=0)
.reset_index()
)
我有以下数据框:
product timestamp count
0 apple 2021-06-29 11:00:00-04:00 1023
1 apple 2021-06-29 12:00:00-04:00 3790
2 apple 2021-06-29 13:00:00-04:00 128
3 apple 2021-06-29 14:00:00-04:00 0
4 apple 2021-06-29 15:00:00-04:00 323
5 apple 2021-06-29 16:00:00-04:00 4223
6 apple 2021-06-29 17:00:00-04:00 1194
4 orange 2021-06-29 15:00:00-04:00 23
5 orange 2021-06-29 16:00:00-04:00 4289
如您所见,两种产品(苹果和橙子)都有一个与小时相关的计数。
对于苹果,我们有时间间隔 2021-06-29 11:00:00-04:00 - 2021-06-29 17:00:00-04:00
的数据,而对于橙子,我们有时间间隔 2021-06-29 15:00:00-04:00 - 2021-06-29 16:00:00-04:00
.
我想延长时间间隔,以便对于这两种产品,我们都有从一天开始到结束的每小时数据,即在间隔 2021-06-29 00:00:00-04:00 - 2021-06-29 23:00:00-04:00
.
结果如下:
product timestamp count
0 apple 2021-06-29 00:00:00-04:00 0
1 apple 2021-06-29 01:00:00-04:00 0
2 apple 2021-06-29 02:00:00-04:00 0
3 apple 2021-06-29 03:00:00-04:00 0
4 apple 2021-06-29 04:00:00-04:00 0
5 apple 2021-06-29 05:00:00-04:00 0
6 apple 2021-06-29 06:00:00-04:00 0
7 apple 2021-06-29 07:00:00-04:00 0
8 apple 2021-06-29 08:00:00-04:00 0
9 apple 2021-06-29 09:00:00-04:00 0
10 apple 2021-06-29 10:00:00-04:00 0
11 apple 2021-06-29 11:00:00-04:00 1023
12 apple 2021-06-29 12:00:00-04:00 3790
13 apple 2021-06-29 13:00:00-04:00 128
14 apple 2021-06-29 14:00:00-04:00 0
15 apple 2021-06-29 15:00:00-04:00 323
16 apple 2021-06-29 16:00:00-04:00 4223
17 apple 2021-06-29 17:00:00-04:00 1194
18 apple 2021-06-29 18:00:00-04:00 0
19 apple 2021-06-29 19:00:00-04:00 0
20 apple 2021-06-29 20:00:00-04:00 0
21 apple 2021-06-29 21:00:00-04:00 0
22 apple 2021-06-29 22:00:00-04:00 0
23 apple 2021-06-29 23:00:00-04:00 0
24 orange 2021-06-29 00:00:00-04:00 0
25 orange 2021-06-29 01:00:00-04:00 0
26 orange 2021-06-29 02:00:00-04:00 0
27 orange 2021-06-29 03:00:00-04:00 0
28 orange 2021-06-29 04:00:00-04:00 0
29 orange 2021-06-29 05:00:00-04:00 0
30 orange 2021-06-29 06:00:00-04:00 0
31 orange 2021-06-29 07:00:00-04:00 0
32 orange 2021-06-29 08:00:00-04:00 0
33 orange 2021-06-29 09:00:00-04:00 0
34 orange 2021-06-29 10:00:00-04:00 0
35 orange 2021-06-29 11:00:00-04:00 0
36 orange 2021-06-29 12:00:00-04:00 0
37 orange 2021-06-29 13:00:00-04:00 0
38 orange 2021-06-29 14:00:00-04:00 0
39 orange 2021-06-29 15:00:00-04:00 23
40 orange 2021-06-29 16:00:00-04:00 4289
41 orange 2021-06-29 17:00:00-04:00 0
42 orange 2021-06-29 18:00:00-04:00 0
43 orange 2021-06-29 19:00:00-04:00 0
44 orange 2021-06-29 20:00:00-04:00 0
45 orange 2021-06-29 21:00:00-04:00 0
46 orange 2021-06-29 22:00:00-04:00 0
47 orange 2021-06-29 23:00:00-04:00 0
到目前为止,我试图重新索引 DataFrame,但我做错了。
谢谢!
让我们试试:
- 如果尚未转换
to_datetime
:
df['timestamp'] = pd.to_datetime(df['timestamp'])
动态查找开始和结束日期时间(这也可以静态完成)并创建小时范围:
动态:
dates = df['timestamp'].dt.date date_range = pd.date_range(start=f'{dates.min()} 00:00:00-04:00', end=f'{dates.max()} 23:00:00-04:00', freq='H')
静态:
date_range = pd.date_range(start='2021-06-29 00:00:00-04:00', end=f'2021-06-29 23:00:00-04:00', freq='H')
创建
MultiIndex.from_product
based on theunique
产品值并建立date_range
:midx = pd.MultiIndex.from_product( [df['product'].unique(), date_range], names=['product', 'timestamp'] )
set_index
+reindex
+reset_index
放大框架:df = ( df.set_index(['product', 'timestamp']) .reindex(midx, fill_value=0) .reset_index() )
df
:product timestamp count 0 apple 2021-06-29 00:00:00-04:00 0 1 apple 2021-06-29 01:00:00-04:00 0 2 apple 2021-06-29 02:00:00-04:00 0 3 apple 2021-06-29 03:00:00-04:00 0 4 apple 2021-06-29 04:00:00-04:00 0 5 apple 2021-06-29 05:00:00-04:00 0 6 apple 2021-06-29 06:00:00-04:00 0 7 apple 2021-06-29 07:00:00-04:00 0 8 apple 2021-06-29 08:00:00-04:00 0 9 apple 2021-06-29 09:00:00-04:00 0 10 apple 2021-06-29 10:00:00-04:00 0 11 apple 2021-06-29 11:00:00-04:00 1023 12 apple 2021-06-29 12:00:00-04:00 3790 13 apple 2021-06-29 13:00:00-04:00 128 14 apple 2021-06-29 14:00:00-04:00 0 15 apple 2021-06-29 15:00:00-04:00 323 16 apple 2021-06-29 16:00:00-04:00 4223 17 apple 2021-06-29 17:00:00-04:00 1194 18 apple 2021-06-29 18:00:00-04:00 0 19 apple 2021-06-29 19:00:00-04:00 0 20 apple 2021-06-29 20:00:00-04:00 0 21 apple 2021-06-29 21:00:00-04:00 0 22 apple 2021-06-29 22:00:00-04:00 0 23 apple 2021-06-29 23:00:00-04:00 0 24 orange 2021-06-29 00:00:00-04:00 0 25 orange 2021-06-29 01:00:00-04:00 0 26 orange 2021-06-29 02:00:00-04:00 0 27 orange 2021-06-29 03:00:00-04:00 0 28 orange 2021-06-29 04:00:00-04:00 0 29 orange 2021-06-29 05:00:00-04:00 0 30 orange 2021-06-29 06:00:00-04:00 0 31 orange 2021-06-29 07:00:00-04:00 0 32 orange 2021-06-29 08:00:00-04:00 0 33 orange 2021-06-29 09:00:00-04:00 0 34 orange 2021-06-29 10:00:00-04:00 0 35 orange 2021-06-29 11:00:00-04:00 0 36 orange 2021-06-29 12:00:00-04:00 0 37 orange 2021-06-29 13:00:00-04:00 0 38 orange 2021-06-29 14:00:00-04:00 0 39 orange 2021-06-29 15:00:00-04:00 23 40 orange 2021-06-29 16:00:00-04:00 4289 41 orange 2021-06-29 17:00:00-04:00 0 42 orange 2021-06-29 18:00:00-04:00 0 43 orange 2021-06-29 19:00:00-04:00 0 44 orange 2021-06-29 20:00:00-04:00 0 45 orange 2021-06-29 21:00:00-04:00 0 46 orange 2021-06-29 22:00:00-04:00 0 47 orange 2021-06-29 23:00:00-04:00 0
完整的工作示例:
import pandas as pd
df = pd.DataFrame({
'product': ['apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple',
'orange', 'orange'],
'timestamp': ['2021-06-29 11:00:00-04:00', '2021-06-29 12:00:00-04:00',
'2021-06-29 13:00:00-04:00', '2021-06-29 14:00:00-04:00',
'2021-06-29 15:00:00-04:00', '2021-06-29 16:00:00-04:00',
'2021-06-29 17:00:00-04:00', '2021-06-29 15:00:00-04:00',
'2021-06-29 16:00:00-04:00'],
'count': [1023, 3790, 128, 0, 323, 4223, 1194, 23, 4289]
})
df['timestamp'] = pd.to_datetime(df['timestamp'])
dates = df['timestamp'].dt.date
date_range = pd.date_range(start=f'{dates.min()} 00:00:00-04:00',
end=f'{dates.max()} 23:00:00-04:00',
freq='H')
midx = pd.MultiIndex.from_product(
[df['product'].unique(), date_range],
names=['product', 'timestamp']
)
df = (
df.set_index(['product', 'timestamp'])
.reindex(midx, fill_value=0)
.reset_index()
)