如何读取日志元素以找到 5 分钟的间隔
How to read logs elements to find 5 minute intervals
如果给定的日志显示给定 24 小时内的所有 resource
请求 window,表示为 seconds
:
log =
['10', 'user_9', 'resource_10'],
['123', 'user_5', 'resource_9'],
['234', 'user_1', 'resource_3'],
['299', 'user_2', 'resource_3'],
['594', 'user_1', 'resource_1'],
['10293', 'user_8', 'resource_12'],
# Expected return: 4 [resource_10, 9, 3, 3]
您将如何遍历以找到在 5 分钟间隔内打开的 max
# 资源? (300 秒)
我已尝试遍历列表并捕获所有元素 150 <= x <= 150
,但这并没有捕获远离 x
的元素 <300 but >150
。
这里有一些伪代码来概述我的思考过程,但我不太确定如何在不保存每个元素并循环遍历日志 n^2
次的情况下继续进行。
traverse through first element in logs (time)
assign element to currItem
traverse through list again
see if each element is with 300 above/below of currItem
save elements
rinse and repeat
可以使用itertools.groupby
(如果资源按时间排序):
from itertools import groupby
log = [
["10", "user_9", "resource_10"],
["123", "user_5", "resource_9"],
["234", "user_1", "resource_3"],
["299", "user_2", "resource_3"],
["594", "user_1", "resource_1"],
["10293", "user_8", "resource_12"],
]
out = []
for _, g in groupby(log, lambda k: int(k[0]) // 300):
g = list(g)
if len(g) > len(out):
out = [i for *_, i in g]
print(len(out), out)
打印:
4 ['resource_10', 'resource_9', 'resource_3', 'resource_3']
编辑:以上示例根据秒 0
(午夜)找到 300 秒 windows。如果您想根据日志中的第一次找到 300 秒 windows:
log = [
["250", "user_9", "resource_10"],
["275", "user_5", "resource_9"],
["300", "user_1", "resource_3"],
["325", "user_2", "resource_3"],
["350", "user_1", "resource_1"],
["375", "user_8", "resource_12"],
]
out = []
for _, g in groupby(log, lambda k: (int(k[0]) - int(log[0][0])) // 300):
g = list(g)
if len(g) > len(out):
out = [i for *_, i in g]
print(len(out), out)
打印:
6 ['resource_10', 'resource_9', 'resource_3', 'resource_3', 'resource_1', 'resource_12']
P.S.: 您可以先将秒数转换为整数以加快处理速度。
编辑 2:如果你想找到 "dynamic" 300 秒 windows:
log = [
["0", "user_9", "resource_10"],
["890", "user_5", "resource_9"],
["900", "user_1", "resource_3"],
["910", "user_2", "resource_4"],
["1600", "user_2", "resource_5"],
]
log = [[int(subl[0]), *subl[1:]] for subl in log]
current_min = log[0][0]
groups = [[log[0][-1]]]
for subl in log[1:]:
if subl[0] - current_min < 300:
groups[-1].append(subl[-1])
else:
groups.append([subl[-1]])
current_min = subl[0]
mx = max(groups, key=len)
print(len(mx), mx)
打印:
3 ['resource_9', 'resource_3', 'resource_4']
根据其他答案,我可以想出一个替代解决方案,而无需使用 groupby
或 import
任何东西。
这基本上将楼层划分的结果存储到 dictionary
key
并将关联的资源存储为 values
.
def iterLog(logs):
if not logs: return [] # base case
logs.sort(key=lambda x: int(x[0])) # sort logs
resDict = {} # Dictionary to store results
for time, user, resource in logs:
remain = int(time) // 300
if remain in resDict:
resDict[remain].append(resource)
else:
resDict[remain] = [resource]
return resDict
# resDict = {
# 0: ['resource_10', 'resource_9', 'resource_3', 'resource_3'],
# 1: ['resource_1'],
# 34: ['resource_12']
# }
从那里您只需找到计数较高的列表。
如果给定的日志显示给定 24 小时内的所有 resource
请求 window,表示为 seconds
:
log =
['10', 'user_9', 'resource_10'],
['123', 'user_5', 'resource_9'],
['234', 'user_1', 'resource_3'],
['299', 'user_2', 'resource_3'],
['594', 'user_1', 'resource_1'],
['10293', 'user_8', 'resource_12'],
# Expected return: 4 [resource_10, 9, 3, 3]
您将如何遍历以找到在 5 分钟间隔内打开的 max
# 资源? (300 秒)
我已尝试遍历列表并捕获所有元素 150 <= x <= 150
,但这并没有捕获远离 x
的元素 <300 but >150
。
这里有一些伪代码来概述我的思考过程,但我不太确定如何在不保存每个元素并循环遍历日志 n^2
次的情况下继续进行。
traverse through first element in logs (time)
assign element to currItem
traverse through list again
see if each element is with 300 above/below of currItem
save elements
rinse and repeat
可以使用itertools.groupby
(如果资源按时间排序):
from itertools import groupby
log = [
["10", "user_9", "resource_10"],
["123", "user_5", "resource_9"],
["234", "user_1", "resource_3"],
["299", "user_2", "resource_3"],
["594", "user_1", "resource_1"],
["10293", "user_8", "resource_12"],
]
out = []
for _, g in groupby(log, lambda k: int(k[0]) // 300):
g = list(g)
if len(g) > len(out):
out = [i for *_, i in g]
print(len(out), out)
打印:
4 ['resource_10', 'resource_9', 'resource_3', 'resource_3']
编辑:以上示例根据秒 0
(午夜)找到 300 秒 windows。如果您想根据日志中的第一次找到 300 秒 windows:
log = [
["250", "user_9", "resource_10"],
["275", "user_5", "resource_9"],
["300", "user_1", "resource_3"],
["325", "user_2", "resource_3"],
["350", "user_1", "resource_1"],
["375", "user_8", "resource_12"],
]
out = []
for _, g in groupby(log, lambda k: (int(k[0]) - int(log[0][0])) // 300):
g = list(g)
if len(g) > len(out):
out = [i for *_, i in g]
print(len(out), out)
打印:
6 ['resource_10', 'resource_9', 'resource_3', 'resource_3', 'resource_1', 'resource_12']
P.S.: 您可以先将秒数转换为整数以加快处理速度。
编辑 2:如果你想找到 "dynamic" 300 秒 windows:
log = [
["0", "user_9", "resource_10"],
["890", "user_5", "resource_9"],
["900", "user_1", "resource_3"],
["910", "user_2", "resource_4"],
["1600", "user_2", "resource_5"],
]
log = [[int(subl[0]), *subl[1:]] for subl in log]
current_min = log[0][0]
groups = [[log[0][-1]]]
for subl in log[1:]:
if subl[0] - current_min < 300:
groups[-1].append(subl[-1])
else:
groups.append([subl[-1]])
current_min = subl[0]
mx = max(groups, key=len)
print(len(mx), mx)
打印:
3 ['resource_9', 'resource_3', 'resource_4']
根据其他答案,我可以想出一个替代解决方案,而无需使用 groupby
或 import
任何东西。
这基本上将楼层划分的结果存储到 dictionary
key
并将关联的资源存储为 values
.
def iterLog(logs):
if not logs: return [] # base case
logs.sort(key=lambda x: int(x[0])) # sort logs
resDict = {} # Dictionary to store results
for time, user, resource in logs:
remain = int(time) // 300
if remain in resDict:
resDict[remain].append(resource)
else:
resDict[remain] = [resource]
return resDict
# resDict = {
# 0: ['resource_10', 'resource_9', 'resource_3', 'resource_3'],
# 1: ['resource_1'],
# 34: ['resource_12']
# }
从那里您只需找到计数较高的列表。