计算连续两次之间出现的次数 python
count number of occurences between two consecutive times python
我有一些日期时间信息如下:
DateTime
2018/01/01 01:00:00
2018/01/01 02:30:00
2018/01/01 03:10:00
2018/01/01 04:00:00
2018/01/01 05:25:00
我还有一些其他的事件发生时间:
occurrence
2018/01/01 01:01:00
2018/01/01 01:02:00
2018/01/01 02:31:00
2018/01/01 04:05:00
我想计算任意两个连续时间间隔之间事件发生的次数,以生成以下内容:
Occurrence Start Time End Time
2 2018/01/01 01:00:00 2018/01/01 02:30:00
1 2018/01/01 02:30:00 2018/01/01 03:10:00
0 2018/01/01 03:10:00 2018/01/01 04:00:00
1 2018/01/01 04:00:00 2018/01/01 05:25:00
我正在考虑使用类似
的东西
sum(1 if meets_condition(x) else 0 for x in my_list)
但我不知道如何实现。有人可以帮忙吗?
您可以使用:
from datetime import datetime
from collections import Counter
t = """2018/01/01 01:00:00
2018/01/01 02:30:00
2018/01/01 03:10:00
2018/01/01 04:00:00
2018/01/01 05:25:00"""
occurrence = """2018/01/01 01:01:00
2018/01/01 01:02:00
2018/01/01 02:31:00
2018/01/01 04:05:00"""
fmt ='%Y/%m/%d %H:%M:%S'
dates = [datetime.strptime(d.strip(), fmt) for d in t.split('\n')]
intervals = [(d1, d2) for d1, d2 in zip(dates, dates[1:])] # already sorted
occ = [datetime.strptime(d.strip(), fmt) for d in occurrence.split('\n')]
count = Counter()
for o in occ:
for d1, d2 in intervals:
if o < d2:
count[(d1, d2)] += 1
break
print('Occurrence Start Time End Time')
for d1, d2 in intervals:
print(str(count[(d1, d2)]).ljust(15), d1.strftime(fmt).ljust(24), d2.strftime(fmt))
输出:
Occurrence Start Time End Time
2 2018/01/01 01:00:00 2018/01/01 02:30:00
1 2018/01/01 02:30:00 2018/01/01 03:10:00
0 2018/01/01 03:10:00 2018/01/01 04:00:00
1 2018/01/01 04:00:00 2018/01/01 05:25:00
试试这个:
# pip install dateutil-python
from collections import Counter
import dateutil.parser
bases = [
"2018/01/01 01:00:00",
"2018/01/01 02:30:00",
"2018/01/01 03:10:00",
"2018/01/01 04:00:00",
"2018/01/01 05:25:00"
]
occurances = [
"2018/01/01 01:01:00",
"2018/01/01 01:02:00",
"2018/01/01 02:31:00",
"2018/01/01 04:05:00"
]
class DateInterval:
def __init__(self, start, end):
self.start = dateutil.parser.parse(start)
self.end = dateutil.parser.parse(end)
def __contains__(self, other) -> bool:
return self.start <= dateutil.parser.parse(other) <= self.end
def __repr__(self):
return f'<Interval:{self.start} ~ {self.end}>'
my_intervals = Counter({DateInterval(i, bases[n+1]): 0 for n, i in enumerate(bases) if n < len(bases)-1})
for occ in occurances:
for intr in my_intervals:
my_intervals[intr] += int(occ in intr)
print(my_intervals)
您的 sum/for 组合是正确的,但我们必须对 python 的日期时间 class 做一些额外的魔法。在这里,我们只是将字符串解析为日期时间对象,然后使用比较运算符来检查它是否在范围内。
from datetime import datetime as dt
dates = ["2018/01/01 01:00:00", "2018/01/01 02:30:00", "2018/01/01 04:00:00"]
events = ["2018/01/01 01:01:00", "2018/01/01 01:02:00", "2018/01/01 02:31:00"]
dates = [dt.fromisoformat(date.replace("/", "-")) for date in dates]
events = [dt.fromisoformat(event.replace("/", "-")) for event in events]
buckets = [(x,y) for x,y in zip(dates, dates[1:])]
result = dict()
for start, end in buckets:
result[str(start) + " to " + str(end)] = sum([1 if start >= date < end else 0 for date in dates])
print(result)
# {'2018-01-01 01:00:00 to 2018-01-01 02:30:00': 1, '2018-01-01 02:30:00 to 2018-01-01 04:00:00': 2}
我有一些日期时间信息如下:
DateTime
2018/01/01 01:00:00
2018/01/01 02:30:00
2018/01/01 03:10:00
2018/01/01 04:00:00
2018/01/01 05:25:00
我还有一些其他的事件发生时间:
occurrence
2018/01/01 01:01:00
2018/01/01 01:02:00
2018/01/01 02:31:00
2018/01/01 04:05:00
我想计算任意两个连续时间间隔之间事件发生的次数,以生成以下内容:
Occurrence Start Time End Time
2 2018/01/01 01:00:00 2018/01/01 02:30:00
1 2018/01/01 02:30:00 2018/01/01 03:10:00
0 2018/01/01 03:10:00 2018/01/01 04:00:00
1 2018/01/01 04:00:00 2018/01/01 05:25:00
我正在考虑使用类似
的东西sum(1 if meets_condition(x) else 0 for x in my_list)
但我不知道如何实现。有人可以帮忙吗?
您可以使用:
from datetime import datetime
from collections import Counter
t = """2018/01/01 01:00:00
2018/01/01 02:30:00
2018/01/01 03:10:00
2018/01/01 04:00:00
2018/01/01 05:25:00"""
occurrence = """2018/01/01 01:01:00
2018/01/01 01:02:00
2018/01/01 02:31:00
2018/01/01 04:05:00"""
fmt ='%Y/%m/%d %H:%M:%S'
dates = [datetime.strptime(d.strip(), fmt) for d in t.split('\n')]
intervals = [(d1, d2) for d1, d2 in zip(dates, dates[1:])] # already sorted
occ = [datetime.strptime(d.strip(), fmt) for d in occurrence.split('\n')]
count = Counter()
for o in occ:
for d1, d2 in intervals:
if o < d2:
count[(d1, d2)] += 1
break
print('Occurrence Start Time End Time')
for d1, d2 in intervals:
print(str(count[(d1, d2)]).ljust(15), d1.strftime(fmt).ljust(24), d2.strftime(fmt))
输出:
Occurrence Start Time End Time
2 2018/01/01 01:00:00 2018/01/01 02:30:00
1 2018/01/01 02:30:00 2018/01/01 03:10:00
0 2018/01/01 03:10:00 2018/01/01 04:00:00
1 2018/01/01 04:00:00 2018/01/01 05:25:00
试试这个:
# pip install dateutil-python
from collections import Counter
import dateutil.parser
bases = [
"2018/01/01 01:00:00",
"2018/01/01 02:30:00",
"2018/01/01 03:10:00",
"2018/01/01 04:00:00",
"2018/01/01 05:25:00"
]
occurances = [
"2018/01/01 01:01:00",
"2018/01/01 01:02:00",
"2018/01/01 02:31:00",
"2018/01/01 04:05:00"
]
class DateInterval:
def __init__(self, start, end):
self.start = dateutil.parser.parse(start)
self.end = dateutil.parser.parse(end)
def __contains__(self, other) -> bool:
return self.start <= dateutil.parser.parse(other) <= self.end
def __repr__(self):
return f'<Interval:{self.start} ~ {self.end}>'
my_intervals = Counter({DateInterval(i, bases[n+1]): 0 for n, i in enumerate(bases) if n < len(bases)-1})
for occ in occurances:
for intr in my_intervals:
my_intervals[intr] += int(occ in intr)
print(my_intervals)
您的 sum/for 组合是正确的,但我们必须对 python 的日期时间 class 做一些额外的魔法。在这里,我们只是将字符串解析为日期时间对象,然后使用比较运算符来检查它是否在范围内。
from datetime import datetime as dt
dates = ["2018/01/01 01:00:00", "2018/01/01 02:30:00", "2018/01/01 04:00:00"]
events = ["2018/01/01 01:01:00", "2018/01/01 01:02:00", "2018/01/01 02:31:00"]
dates = [dt.fromisoformat(date.replace("/", "-")) for date in dates]
events = [dt.fromisoformat(event.replace("/", "-")) for event in events]
buckets = [(x,y) for x,y in zip(dates, dates[1:])]
result = dict()
for start, end in buckets:
result[str(start) + " to " + str(end)] = sum([1 if start >= date < end else 0 for date in dates])
print(result)
# {'2018-01-01 01:00:00 to 2018-01-01 02:30:00': 1, '2018-01-01 02:30:00 to 2018-01-01 04:00:00': 2}