将日期列表拆分为连续日期的子集
Split a list of dates into subsets of consecutive dates
我有一个日期数组,其中可以包含多个日期范围。
dates = [
'2020-01-01',
'2020-01-02',
'2020-01-03',
'2020-01-06',
'2020-01-07',
'2020-01-08'
]
在此示例中,列表包含 2 个独立的连续日期范围(2020-01-01 到 2020-01-03 和 2020-01-06 到 2020-01-08)
我正在尝试弄清楚如何遍历此列表并找到所有连续的日期范围。
我正在看的一篇文章 (How to detect if dates are consecutive in Python?) 似乎有一个很好的方法,但是,我正在努力在我的用例中实现这个逻辑。
这假设单一日期 "ranges" 仍然由 2 个日期表示:
def makedate(s):
return datetime.strptime( s, "%Y-%m-%d" )
def splitIntoRanges( dates ):
ranges = []
start_s = last_s = dates[0]
last = makedate(start_s)
for curr_s in dates[1:]:
curr = makedate(curr_s)
if (curr - last).days > 1:
ranges.append((start_s,last_s))
start_s = curr_s
last_s = curr_s
last = curr
return ranges + [(start_s,last_s)]
我采用了类似的方法,但绝对不如@Scott 优雅:
ranges = []
dates = [datetime.strptime(date, '%Y-%m-%d') for date in dates]
start = dates[0]
for i in range(1, len(dates)):
if (dates[i] - dates[i-1]).days == 1 and i==len(dates)-1:
end = dates[i]
ranges.append(f'{start} to {end}')
start = dates[i]
elif (dates[i] - dates[i - 1]).days > 1:
end = dates[i - 1]
ranges.append(f'{start} to {end}')
start = dates[i]
else:
continue
我在一秒钟内找到了解决方案的关键 post 并将其拼凑在一起。
我的问题分为两个部分:
- 如何有效地表示日期列表
答案:
pto = [
'2020-01-03',
'2020-01-08',
'2020-01-02',
'2020-01-07',
'2020-01-01',
'2020-01-06'
]
ordinal_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto]
- 一旦你有了一个整数表示的日期列表,你可以简单地寻找连续的整数并获得每个范围的上限和下限,然后转换回 yyyy-mm-dd 格式。
答案:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
我的完整功能:
def get_date_ranges(pto_list: list) -> list:
pto_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto_list]
nums = sorted(set(pto_dates))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s + 1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
ordinal_ranges = list(zip(edges, edges))
date_bounds = []
for start, end in ordinal_ranges:
date_bounds.append((
datetime.datetime.fromordinal(start).strftime('%Y-%m-%d'),
datetime.datetime.fromordinal(end).strftime('%Y-%m-%d')
))
return date_bounds
您可以找到所有连续的日期范围并将它们附加到列表的列表中,并根据索引访问您的范围,但我更喜欢在字典中使用键以提高可读性。
方法如下:(注意:请阅读评论)
dates = [datetime.strptime(d, "%Y-%m-%d") for d in dates] # new datetime parsed from a string
date_ints = [d.toordinal() for d in dates] # toordinal() returns the day count from the date 01/01/01 in integers
ranges = {}; arange = []; prev=0; index=0; j=1
for i in date_ints: # iterate through date integers
if i+1 == date_ints[index] + 1 and i - 1 == prev: # check and compare if integers are in sequence
arange.append(dates[index].strftime("%Y-%m-%d"))
elif prev == 0: # append first date to 'arange' list since 'prev' has not been updated
arange.append(dates[index].strftime("%Y-%m-%d"))
else:
ranges.update({f'Range{j}': tuple(arange)}) # integer are no longer in sequence, update dictionary with new range
arange = []; j += 1 # clear 'arange' and start appending to new range
arange.append(dates[index].strftime("%Y-%m-%d"))
index += 1; prev = i
ranges.update({f'Range{j}': tuple(arange)})
print(ranges)
print(ranges['Range1']) # access a range based on the associated key
print(ranges['Range2'])
输出:
{'Range1': ('2020-01-01', '2020-01-02', '2020-01-03'), 'Range2': ('2020-01-06', '2020-01-07', '2020-01-08')}
('2020-01-01', '2020-01-02', '2020-01-03')
('2020-01-06', '2020-01-07', '2020-01-08')
More itertools 有一个名为 consecutive_groups
的函数可以为您执行此操作:
或者你可以查看源码复制它的做法:
from datetime import datetime
from itertools import groupby
from operator import itemgetter
def consecutive_groups(iterable, ordering=lambda x: x):
for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
yield map(itemgetter(1), g)
然后使用函数:
for g in consecutive_groups(dates, lambda x: datetime.strptime(x, '%Y-%m-%d').toordinal()):
print(list(g))
或(更恰当)使用函数代替lambda:
def to_date(date):
return datetime.strptime(date, '%Y-%m-%d').toordinal()
for g in consecutive_groups(dates, to_date):
print(list(g))
['2020-01-01', '2020-01-02', '2020-01-03']
['2020-01-06', '2020-01-07', '2020-01-08']
我有一个日期数组,其中可以包含多个日期范围。
dates = [
'2020-01-01',
'2020-01-02',
'2020-01-03',
'2020-01-06',
'2020-01-07',
'2020-01-08'
]
在此示例中,列表包含 2 个独立的连续日期范围(2020-01-01 到 2020-01-03 和 2020-01-06 到 2020-01-08)
我正在尝试弄清楚如何遍历此列表并找到所有连续的日期范围。
我正在看的一篇文章 (How to detect if dates are consecutive in Python?) 似乎有一个很好的方法,但是,我正在努力在我的用例中实现这个逻辑。
这假设单一日期 "ranges" 仍然由 2 个日期表示:
def makedate(s):
return datetime.strptime( s, "%Y-%m-%d" )
def splitIntoRanges( dates ):
ranges = []
start_s = last_s = dates[0]
last = makedate(start_s)
for curr_s in dates[1:]:
curr = makedate(curr_s)
if (curr - last).days > 1:
ranges.append((start_s,last_s))
start_s = curr_s
last_s = curr_s
last = curr
return ranges + [(start_s,last_s)]
我采用了类似的方法,但绝对不如@Scott 优雅:
ranges = []
dates = [datetime.strptime(date, '%Y-%m-%d') for date in dates]
start = dates[0]
for i in range(1, len(dates)):
if (dates[i] - dates[i-1]).days == 1 and i==len(dates)-1:
end = dates[i]
ranges.append(f'{start} to {end}')
start = dates[i]
elif (dates[i] - dates[i - 1]).days > 1:
end = dates[i - 1]
ranges.append(f'{start} to {end}')
start = dates[i]
else:
continue
我在一秒钟内找到了解决方案的关键 post 并将其拼凑在一起。
我的问题分为两个部分:
- 如何有效地表示日期列表
答案:
pto = [
'2020-01-03',
'2020-01-08',
'2020-01-02',
'2020-01-07',
'2020-01-01',
'2020-01-06'
]
ordinal_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto]
- 一旦你有了一个整数表示的日期列表,你可以简单地寻找连续的整数并获得每个范围的上限和下限,然后转换回 yyyy-mm-dd 格式。
答案:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
我的完整功能:
def get_date_ranges(pto_list: list) -> list:
pto_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto_list]
nums = sorted(set(pto_dates))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s + 1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
ordinal_ranges = list(zip(edges, edges))
date_bounds = []
for start, end in ordinal_ranges:
date_bounds.append((
datetime.datetime.fromordinal(start).strftime('%Y-%m-%d'),
datetime.datetime.fromordinal(end).strftime('%Y-%m-%d')
))
return date_bounds
您可以找到所有连续的日期范围并将它们附加到列表的列表中,并根据索引访问您的范围,但我更喜欢在字典中使用键以提高可读性。
方法如下:(注意:请阅读评论)
dates = [datetime.strptime(d, "%Y-%m-%d") for d in dates] # new datetime parsed from a string
date_ints = [d.toordinal() for d in dates] # toordinal() returns the day count from the date 01/01/01 in integers
ranges = {}; arange = []; prev=0; index=0; j=1
for i in date_ints: # iterate through date integers
if i+1 == date_ints[index] + 1 and i - 1 == prev: # check and compare if integers are in sequence
arange.append(dates[index].strftime("%Y-%m-%d"))
elif prev == 0: # append first date to 'arange' list since 'prev' has not been updated
arange.append(dates[index].strftime("%Y-%m-%d"))
else:
ranges.update({f'Range{j}': tuple(arange)}) # integer are no longer in sequence, update dictionary with new range
arange = []; j += 1 # clear 'arange' and start appending to new range
arange.append(dates[index].strftime("%Y-%m-%d"))
index += 1; prev = i
ranges.update({f'Range{j}': tuple(arange)})
print(ranges)
print(ranges['Range1']) # access a range based on the associated key
print(ranges['Range2'])
输出:
{'Range1': ('2020-01-01', '2020-01-02', '2020-01-03'), 'Range2': ('2020-01-06', '2020-01-07', '2020-01-08')}
('2020-01-01', '2020-01-02', '2020-01-03')
('2020-01-06', '2020-01-07', '2020-01-08')
More itertools 有一个名为 consecutive_groups
的函数可以为您执行此操作:
或者你可以查看源码复制它的做法:
from datetime import datetime
from itertools import groupby
from operator import itemgetter
def consecutive_groups(iterable, ordering=lambda x: x):
for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
yield map(itemgetter(1), g)
然后使用函数:
for g in consecutive_groups(dates, lambda x: datetime.strptime(x, '%Y-%m-%d').toordinal()):
print(list(g))
或(更恰当)使用函数代替lambda:
def to_date(date):
return datetime.strptime(date, '%Y-%m-%d').toordinal()
for g in consecutive_groups(dates, to_date):
print(list(g))
['2020-01-01', '2020-01-02', '2020-01-03']
['2020-01-06', '2020-01-07', '2020-01-08']