将日期列表拆分为连续日期的子集

Split a list of dates into subsets of consecutive dates

我有一个日期数组,其中可以包含多个日期范围。

dates = [
 '2020-01-01',
 '2020-01-02',
 '2020-01-03',
 '2020-01-06',
 '2020-01-07',
 '2020-01-08'
]

在此示例中,列表包含 2 个独立的连续日期范围(2020-01-01 到 2020-01-03 和 2020-01-06 到 2020-01-08)

我正在尝试弄清楚如何遍历此列表并找到所有连续的日期范围。

我正在看的一篇文章 (How to detect if dates are consecutive in Python?) 似乎有一个很好的方法,但是,我正在努力在我的用例中实现这个逻辑。

这假设单一日期 "ranges" 仍然由 2 个日期表示:

def makedate(s):
    return datetime.strptime( s, "%Y-%m-%d" )
def splitIntoRanges( dates ):
    ranges = []
    start_s = last_s = dates[0]
    last = makedate(start_s)
    for curr_s in dates[1:]:
        curr = makedate(curr_s)
        if (curr - last).days > 1:
            ranges.append((start_s,last_s))
            start_s = curr_s
        last_s = curr_s
        last = curr
    return ranges + [(start_s,last_s)]

我采用了类似的方法,但绝对不如@Scott 优雅:

ranges = []

dates = [datetime.strptime(date, '%Y-%m-%d') for date in dates]
start = dates[0]

for i in range(1, len(dates)):
    if (dates[i] - dates[i-1]).days == 1 and i==len(dates)-1:
        end = dates[i]
        ranges.append(f'{start} to {end}')
        start = dates[i]
    elif (dates[i] - dates[i - 1]).days > 1:
        end = dates[i - 1]
        ranges.append(f'{start} to {end}')
        start = dates[i]
    else:
        continue

我在一秒钟内找到了解决方案的关键 post 并将其拼凑在一起。

我的问题分为两个部分:

  1. 如何有效地表示日期列表

答案:

pto = [
    '2020-01-03',
    '2020-01-08',
    '2020-01-02',
    '2020-01-07',
    '2020-01-01',
    '2020-01-06'
]

ordinal_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto]
  1. 一旦你有了一个整数表示的日期列表,你可以简单地寻找连续的整数并获得每个范围的上限和下限,然后转换回 yyyy-mm-dd 格式。

答案:

def ranges(nums):
    nums = sorted(set(nums))
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    return list(zip(edges, edges))

我的完整功能:

def get_date_ranges(pto_list: list) -> list:
    pto_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto_list]
    nums = sorted(set(pto_dates))
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s + 1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    ordinal_ranges = list(zip(edges, edges))
    date_bounds = []
    for start, end in ordinal_ranges:
        date_bounds.append((
            datetime.datetime.fromordinal(start).strftime('%Y-%m-%d'),
            datetime.datetime.fromordinal(end).strftime('%Y-%m-%d')
        ))
    return date_bounds

您可以找到所有连续的日期范围并将它们附加到列表的列表中,并根据索引访问您的范围,但我更喜欢在字典中使用键以提高可读性。

方法如下:(注意:请阅读评论)

dates = [datetime.strptime(d, "%Y-%m-%d") for d in dates] # new datetime parsed from a string
date_ints = [d.toordinal() for d in dates]  # toordinal() returns the day count from the date 01/01/01 in integers
ranges = {}; arange = []; prev=0; index=0; j=1
for i in date_ints: # iterate through date integers
    if i+1 == date_ints[index] + 1 and i - 1 == prev: # check and compare if integers are in sequence
        arange.append(dates[index].strftime("%Y-%m-%d"))
    elif prev == 0: # append first date to 'arange' list since 'prev' has not been updated
        arange.append(dates[index].strftime("%Y-%m-%d"))
    else:
        ranges.update({f'Range{j}': tuple(arange)}) # integer are no longer in sequence, update dictionary with new range  
        arange = []; j += 1                                   # clear 'arange' and start appending to new range  
        arange.append(dates[index].strftime("%Y-%m-%d"))
    index += 1; prev = i
ranges.update({f'Range{j}': tuple(arange)})
print(ranges)  
print(ranges['Range1'])  # access a range based on the associated key
print(ranges['Range2']) 

输出:

{'Range1': ('2020-01-01', '2020-01-02', '2020-01-03'), 'Range2': ('2020-01-06', '2020-01-07', '2020-01-08')}
('2020-01-01', '2020-01-02', '2020-01-03')
('2020-01-06', '2020-01-07', '2020-01-08')

More itertools 有一个名为 consecutive_groups 的函数可以为您执行此操作:

或者你可以查看源码复制它的做法:

from datetime import datetime
from itertools import groupby
from operator import itemgetter

def consecutive_groups(iterable, ordering=lambda x: x):
    for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
        yield map(itemgetter(1), g)

然后使用函数:

for g in consecutive_groups(dates, lambda x: datetime.strptime(x, '%Y-%m-%d').toordinal()):
    print(list(g))

(更恰当)使用函数代替lambda:

def to_date(date):
    return datetime.strptime(date, '%Y-%m-%d').toordinal()

for g in consecutive_groups(dates, to_date):
    print(list(g))

['2020-01-01', '2020-01-02', '2020-01-03']
['2020-01-06', '2020-01-07', '2020-01-08']