使用 Python 中的日期列表循环 24 小时周期

Loop over 24-hour period using list of dates in Python

我有 Python 中 np.datetime64 个日期的列表:

['2016-12-01T02:00:00.000000000', '2016-12-01T04:00:00.000000000',
 '2016-12-01T06:00:00.000000000', '2016-12-01T08:00:00.000000000',
 '2016-12-01T10:00:00.000000000', '2016-12-01T12:00:00.000000000', 
 '2016-12-01T14:00:00.000000000', '2016-12-01T16:00:00.000000000', 
 '2016-12-01T18:00:00.000000000', '2016-12-01T20:00:00.000000000', 
 '2016-12-01T22:00:00.000000000', '2016-12-02T00:00:00.000000000', 
 '2016-12-02T02:00:00.000000000', '2016-12-02T04:00:00.000000000', 
 '2016-12-02T06:00:00.000000000', '2016-12-02T08:00:00.000000000', 
 '2016-12-02T10:00:00.000000000', '2016-12-02T12:00:00.000000000', 
 '2016-12-02T14:00:00.000000000', '2016-12-02T16:00:00.000000000', 
 '2016-12-02T18:00:00.000000000', '2016-12-02T20:00:00.000000000', 
 '2016-12-02T22:00:00.000000000', '2016-12-03T00:00:00.000000000', 
 '2016-12-03T02:00:00.000000000', '2016-12-03T04:00:00.000000000',
 '2016-12-03T06:00:00.000000000', '2016-12-03T08:00:00.000000000', 
 '2016-12-03T10:00:00.000000000', '2016-12-03T12:00:00.000000000', 
 '2016-12-03T14:00:00.000000000', '2016-12-03T16:00:00.000000000', 
 '2016-12-03T18:00:00.000000000', '2016-12-03T20:00:00.000000000', 
 '2016-12-03T22:00:00.000000000']

并且我希望循环遍历列表中的每个日历日。我试图从列表中提取每个唯一日期(即找到最小和最大日期并在它们之间创建日期列表)但这不是我想要做的事情的理想选择。

我想要的结果是拥有允许我在列表中找到的每个 date/calendar 天循环并获取与该日期对应的日期时间的代码:

for each_date in date_list:
    ***get all datetimes corresponding to each_date***

(loop would occur 3 times in this example)

注意:

1) 迭代每个 [n:n+24] 或其他任何方法的解决方案将不起作用,因为并非每天都会有相同数量的时间步长。

如果时间戳是有序的,我们可以使用itertools.groupby函数将数组的元素按对应的日期分组

日期可以用np.datetime64.astype(..., dtype='datetime64[D]')得到,所以我们可以这样写:

from numpy import datetime64
from functools import partial
from itertools import groupby

for day, timestamps in <b>groupby(data_array,
                               partial(datetime64.astype, dtype='datetime64[D]'))</b>:
    # process day and timestamps
    pass

这里 day 是一个 datetime64[D] numpy 对象(它只包含日期),timestamps 是一个 iterable(不是列表,但我们可以将其转换为相应时间戳的列表)。 data_array 是包含初始数据的数组。

例如:

>>> for day, timestamps in groupby(data_array,
...                                partial(datetime64.astype, dtype='datetime64[D]')):
...     print((day, list(timestamps)))
... 
(numpy.datetime64('2016-12-01'), [numpy.datetime64('2016-12-01T02:00:00.000000000'), numpy.datetime64('2016-12-01T04:00:00.000000000'), numpy.datetime64('2016-12-01T06:00:00.000000000'), numpy.datetime64('2016-12-01T08:00:00.000000000'), numpy.datetime64('2016-12-01T10:00:00.000000000'), numpy.datetime64('2016-12-01T12:00:00.000000000'), numpy.datetime64('2016-12-01T14:00:00.000000000'), numpy.datetime64('2016-12-01T16:00:00.000000000'), numpy.datetime64('2016-12-01T18:00:00.000000000'), numpy.datetime64('2016-12-01T20:00:00.000000000'), numpy.datetime64('2016-12-01T22:00:00.000000000')])
(numpy.datetime64('2016-12-02'), [numpy.datetime64('2016-12-02T00:00:00.000000000'), numpy.datetime64('2016-12-02T02:00:00.000000000'), numpy.datetime64('2016-12-02T04:00:00.000000000'), numpy.datetime64('2016-12-02T06:00:00.000000000'), numpy.datetime64('2016-12-02T08:00:00.000000000'), numpy.datetime64('2016-12-02T10:00:00.000000000'), numpy.datetime64('2016-12-02T12:00:00.000000000'), numpy.datetime64('2016-12-02T14:00:00.000000000'), numpy.datetime64('2016-12-02T16:00:00.000000000'), numpy.datetime64('2016-12-02T18:00:00.000000000'), numpy.datetime64('2016-12-02T20:00:00.000000000'), numpy.datetime64('2016-12-02T22:00:00.000000000')])
(numpy.datetime64('2016-12-03'), [numpy.datetime64('2016-12-03T00:00:00.000000000'), numpy.datetime64('2016-12-03T02:00:00.000000000'), numpy.datetime64('2016-12-03T04:00:00.000000000'), numpy.datetime64('2016-12-03T06:00:00.000000000'), numpy.datetime64('2016-12-03T08:00:00.000000000'), numpy.datetime64('2016-12-03T10:00:00.000000000'), numpy.datetime64('2016-12-03T12:00:00.000000000'), numpy.datetime64('2016-12-03T14:00:00.000000000'), numpy.datetime64('2016-12-03T16:00:00.000000000'), numpy.datetime64('2016-12-03T18:00:00.000000000'), numpy.datetime64('2016-12-03T20:00:00.000000000'), numpy.datetime64('2016-12-03T22:00:00.000000000')])

所以在这里对于每一天,我们都选择了打印对应timestamps的列表,但这当然是一个的选项。如示例所示,并非所有切片都具有相同的长度(最后两个有一个额外的元素)

注意timestamps是一个迭代器,因此会耗尽,如果不将其转换为列表,那么循环一次后,迭代器就会耗尽.

groupby 以线性时间工作,因为每次它都会检查 "group key" 是否与前一个元素相同,但如前所述,约束是必须对数据进行排序。

您可以使用 collections.defaultdict 作为 O(n) 的解决方案。您可以使用 Pandas 规范化您的 datetime 对象,尽管这也应该可以通过 NumPy 实现。

import pandas as pd
from collections import defaultdict

d = defaultdict(list)

for item in L:
    day = pd.to_datetime(item).normalize().to_datetime64()
    d[day].append(item)

print(d)

defaultdict(list,
            {numpy.datetime64('2016-12-01T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-01T02:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-01T22:00:00.000000000')],
             numpy.datetime64('2016-12-02T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-02T00:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-02T22:00:00.000000000')],
             numpy.datetime64('2016-12-03T00:00:00.000000000'):
                 [numpy.datetime64('2016-12-03T00:00:00.000000000'),
                  ...
                  numpy.datetime64('2016-12-03T22:00:00.000000000')]})