如何在 python 中使用 groupby 处理时间索引

Question

我有一个包含多个变量的 csv 文件。
在变量中，日期和时间是单独包含的。
下图是我的数据形式：

  Date         Time       Axis1     Axis2    Axis3
   .             .         .          .       .
   .             .         .          .       .
2017-10-15    13:40:00     20         0       40
2017-10-15    13:40:10     40         10      100
2017-10-15    13:40:20     50         0       0
2017-10-15    13:40:30     10         10      60
2017-10-15    13:40:40     0          0       20
2017-10-15    13:40:50     0          0       10
2017-10-16    06:20:30     10         0       10
2017-10-16    06:20:40     70         0       10
2017-10-16    06:20:50     20         100     80
   .             .         .          .       .
   .             .         .          .       .

而且行数更多（一万多）
您可能会注意到 10/15 和 10/16 之间存在一些 时间间隔。
我想按分钟对所有三个轴值求和。
我期望的是这个结构：

  Date         Time       Axis1     Axis2    Axis3
   .             .         .          .       .
   .             .         .          .       .
2017-10-15    13:40:00     120        20      230
2017-10-16    06:20:00     100        100     100
2017-10-16    06:21:00     ?          ?       ?
   .             .         .          .       .
   .             .         .          .       .

我尝试使用 groupby、resample 和 pd.Grouper，但它对我不起作用。
主要问题是 time 索引不是从 13:40:00 开始的，而是在我把时间作为索引 使用 groupby 后从 00:00:00 开始('Date') 和重采样('1Min').sum().

感谢您的帮助！

Answer 1

让我们试试：

df = df.set_index(pd.to_datetime(df['Date']+' '+df['Time'], format='%Y-%m-%d %H:%M:%S'))

df.groupby(df.index.floor('T')).sum()

输出：

                     Axis1  Axis2  Axis3
2017-10-15 13:40:00    120     20    230
2017-10-16 06:20:00    100    100    100

注意：在pd.to_datetime中使用format参数来帮助提高效率。使用 floor 避免在丢失的时间上重新采样或分组。

如何在 python 中使用 groupby 处理时间索引

How to handle Time index with using groupby in python

datetime

resampling

python-3.x

pandas

pandas-groupby