Pandas 延长时间序列的间隔

Pandas extends intervals in timeseries

我有以下数据框:

    product     timestamp               count
0   apple    2021-06-29 11:00:00-04:00  1023
1   apple    2021-06-29 12:00:00-04:00  3790
2   apple    2021-06-29 13:00:00-04:00  128
3   apple    2021-06-29 14:00:00-04:00  0
4   apple    2021-06-29 15:00:00-04:00  323
5   apple    2021-06-29 16:00:00-04:00  4223
6   apple    2021-06-29 17:00:00-04:00  1194
4   orange   2021-06-29 15:00:00-04:00  23
5   orange   2021-06-29 16:00:00-04:00  4289

如您所见,两种产品(苹果和橙子)都有一个与小时相关的计数。 对于苹果,我们有时间间隔 2021-06-29 11:00:00-04:00 - 2021-06-29 17:00:00-04:00 的数据,而对于橙子,我们有时间间隔 2021-06-29 15:00:00-04:00 - 2021-06-29 16:00:00-04:00.

的数据

我想延长时间间隔,以便对于这两种产品,我们都有从一天开始到结束的每小时数据,即在间隔 2021-06-29 00:00:00-04:00 - 2021-06-29 23:00:00-04:00.

结果如下:

    product     timestamp               count
0    apple    2021-06-29 00:00:00-04:00  0
1    apple    2021-06-29 01:00:00-04:00  0
2    apple    2021-06-29 02:00:00-04:00  0
3    apple    2021-06-29 03:00:00-04:00  0
4    apple    2021-06-29 04:00:00-04:00  0
5    apple    2021-06-29 05:00:00-04:00  0
6    apple    2021-06-29 06:00:00-04:00  0
7    apple    2021-06-29 07:00:00-04:00  0
8    apple    2021-06-29 08:00:00-04:00  0
9    apple    2021-06-29 09:00:00-04:00  0
10   apple    2021-06-29 10:00:00-04:00  0
11   apple    2021-06-29 11:00:00-04:00  1023
12   apple    2021-06-29 12:00:00-04:00  3790
13   apple    2021-06-29 13:00:00-04:00  128
14   apple    2021-06-29 14:00:00-04:00  0
15   apple    2021-06-29 15:00:00-04:00  323
16   apple    2021-06-29 16:00:00-04:00  4223
17   apple    2021-06-29 17:00:00-04:00  1194
18   apple    2021-06-29 18:00:00-04:00  0
19   apple    2021-06-29 19:00:00-04:00  0
20   apple    2021-06-29 20:00:00-04:00  0
21   apple    2021-06-29 21:00:00-04:00  0
22   apple    2021-06-29 22:00:00-04:00  0
23   apple    2021-06-29 23:00:00-04:00  0
24   orange    2021-06-29 00:00:00-04:00  0
25   orange    2021-06-29 01:00:00-04:00  0
26   orange    2021-06-29 02:00:00-04:00  0
27   orange    2021-06-29 03:00:00-04:00  0
28   orange    2021-06-29 04:00:00-04:00  0
29   orange    2021-06-29 05:00:00-04:00  0
30   orange    2021-06-29 06:00:00-04:00  0
31   orange    2021-06-29 07:00:00-04:00  0
32   orange    2021-06-29 08:00:00-04:00  0
33   orange    2021-06-29 09:00:00-04:00  0
34   orange    2021-06-29 10:00:00-04:00  0
35   orange    2021-06-29 11:00:00-04:00  0
36   orange    2021-06-29 12:00:00-04:00  0
37   orange    2021-06-29 13:00:00-04:00  0
38   orange    2021-06-29 14:00:00-04:00  0
39   orange    2021-06-29 15:00:00-04:00  23
40   orange    2021-06-29 16:00:00-04:00  4289
41   orange    2021-06-29 17:00:00-04:00  0
42   orange    2021-06-29 18:00:00-04:00  0
43   orange    2021-06-29 19:00:00-04:00  0
44   orange    2021-06-29 20:00:00-04:00  0
45   orange    2021-06-29 21:00:00-04:00  0
46   orange    2021-06-29 22:00:00-04:00  0
47   orange    2021-06-29 23:00:00-04:00  0

到目前为止,我试图重新索引 DataFrame,但我做错了。

谢谢!

让我们试试:

  1. 如果尚未转换 to_datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])
  1. 动态查找开始和结束日期时间(这也可以静态完成)并创建小时范围:

    动态:

    dates = df['timestamp'].dt.date
    date_range = pd.date_range(start=f'{dates.min()} 00:00:00-04:00',
                               end=f'{dates.max()} 23:00:00-04:00',
                               freq='H')
    

    静态:

    date_range = pd.date_range(start='2021-06-29 00:00:00-04:00',
                               end=f'2021-06-29 23:00:00-04:00',
                               freq='H')
    
  2. 创建MultiIndex.from_product based on the unique产品值并建立date_range:

    midx = pd.MultiIndex.from_product(
        [df['product'].unique(), date_range],
        names=['product', 'timestamp']
    )
    
  3. set_index + reindex + reset_index 放大框架:

    df = (
        df.set_index(['product', 'timestamp'])
            .reindex(midx, fill_value=0)
            .reset_index()
    )
    

    df:

       product                 timestamp  count
    0    apple 2021-06-29 00:00:00-04:00      0
    1    apple 2021-06-29 01:00:00-04:00      0
    2    apple 2021-06-29 02:00:00-04:00      0
    3    apple 2021-06-29 03:00:00-04:00      0
    4    apple 2021-06-29 04:00:00-04:00      0
    5    apple 2021-06-29 05:00:00-04:00      0
    6    apple 2021-06-29 06:00:00-04:00      0
    7    apple 2021-06-29 07:00:00-04:00      0
    8    apple 2021-06-29 08:00:00-04:00      0
    9    apple 2021-06-29 09:00:00-04:00      0
    10   apple 2021-06-29 10:00:00-04:00      0
    11   apple 2021-06-29 11:00:00-04:00   1023
    12   apple 2021-06-29 12:00:00-04:00   3790
    13   apple 2021-06-29 13:00:00-04:00    128
    14   apple 2021-06-29 14:00:00-04:00      0
    15   apple 2021-06-29 15:00:00-04:00    323
    16   apple 2021-06-29 16:00:00-04:00   4223
    17   apple 2021-06-29 17:00:00-04:00   1194
    18   apple 2021-06-29 18:00:00-04:00      0
    19   apple 2021-06-29 19:00:00-04:00      0
    20   apple 2021-06-29 20:00:00-04:00      0
    21   apple 2021-06-29 21:00:00-04:00      0
    22   apple 2021-06-29 22:00:00-04:00      0
    23   apple 2021-06-29 23:00:00-04:00      0
    24  orange 2021-06-29 00:00:00-04:00      0
    25  orange 2021-06-29 01:00:00-04:00      0
    26  orange 2021-06-29 02:00:00-04:00      0
    27  orange 2021-06-29 03:00:00-04:00      0
    28  orange 2021-06-29 04:00:00-04:00      0
    29  orange 2021-06-29 05:00:00-04:00      0
    30  orange 2021-06-29 06:00:00-04:00      0
    31  orange 2021-06-29 07:00:00-04:00      0
    32  orange 2021-06-29 08:00:00-04:00      0
    33  orange 2021-06-29 09:00:00-04:00      0
    34  orange 2021-06-29 10:00:00-04:00      0
    35  orange 2021-06-29 11:00:00-04:00      0
    36  orange 2021-06-29 12:00:00-04:00      0
    37  orange 2021-06-29 13:00:00-04:00      0
    38  orange 2021-06-29 14:00:00-04:00      0
    39  orange 2021-06-29 15:00:00-04:00     23
    40  orange 2021-06-29 16:00:00-04:00   4289
    41  orange 2021-06-29 17:00:00-04:00      0
    42  orange 2021-06-29 18:00:00-04:00      0
    43  orange 2021-06-29 19:00:00-04:00      0
    44  orange 2021-06-29 20:00:00-04:00      0
    45  orange 2021-06-29 21:00:00-04:00      0
    46  orange 2021-06-29 22:00:00-04:00      0
    47  orange 2021-06-29 23:00:00-04:00      0
    

完整的工作示例:

import pandas as pd

df = pd.DataFrame({
    'product': ['apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple',
                'orange', 'orange'],
    'timestamp': ['2021-06-29 11:00:00-04:00', '2021-06-29 12:00:00-04:00',
                  '2021-06-29 13:00:00-04:00', '2021-06-29 14:00:00-04:00',
                  '2021-06-29 15:00:00-04:00', '2021-06-29 16:00:00-04:00',
                  '2021-06-29 17:00:00-04:00', '2021-06-29 15:00:00-04:00',
                  '2021-06-29 16:00:00-04:00'],
    'count': [1023, 3790, 128, 0, 323, 4223, 1194, 23, 4289]
})
df['timestamp'] = pd.to_datetime(df['timestamp'])

dates = df['timestamp'].dt.date
date_range = pd.date_range(start=f'{dates.min()} 00:00:00-04:00',
                           end=f'{dates.max()} 23:00:00-04:00',
                           freq='H')
midx = pd.MultiIndex.from_product(
    [df['product'].unique(), date_range],
    names=['product', 'timestamp']
)

df = (
    df.set_index(['product', 'timestamp'])
        .reindex(midx, fill_value=0)
        .reset_index()
)