过滤一列并计算另一列中的出现次数
Filter one column and count the occurence in the other column
我试图通过使用第 4 列(警报)在第二列(源 IP)中获得最高出现率。
示例列表:
test=[["2019-01-05 03:15:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:25:49,"192.168.0.15","192.168.0.1:4070","network discover"],
["2019-01-05 03:35:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:55:49,"192.168.0.12","192.168.0.1:4070","network discover"],
["2019-01-05 04:38:13","192.168.0.15","192.168.0.41:445","ETERNALBLUE tool"],
["2019-01-05 05:28:13","192.168.0.12","192.168.0.39:445","ETERNALBLUE tool"]]
期望输出
network discover, 192.168.0.15 = 3
network discovery, 192.168.0.12 = 1
ETERNALBLUE tool, 192.168.0.15 = 1
ETERNALBLUE tool, 192.168.0.12 = 1
使用collections.defaultdict
例如:
from collections import defaultdict
test=[["2019-01-05 03:15:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:25:49","192.168.0.15","192.168.0.1:4070","network discover"],
["2019-01-05 03:35:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:55:49","192.168.0.12","192.168.0.1:4070","network discover"],
["2019-01-05 04:38:13","192.168.0.15","192.168.0.41:445","ETERNALBLUE tool"],
["2019-01-05 05:28:13","192.168.0.12","192.168.0.39:445","ETERNALBLUE tool"]]
result = defaultdict(int)
for i in test:
result[(i[-1], i[1])] += 1
print(result)
输出:
defaultdict(<type 'int'>, {
('network discover', '192.168.0.12'): 1,
('ETERNALBLUE tool', '192.168.0.15'): 1,
('ETERNALBLUE tool', '192.168.0.12'): 1,
('network discover', '192.168.0.15'): 3
})
您可以使用 Counter
:
from collections import Counter
from pprint import pprint
c = Counter((i[-1], i[1]) for i in test)
pprint(c)
输出:
Counter({('network discover', '192.168.0.15'): 3,
('network discover', '192.168.0.12'): 1,
('ETERNALBLUE tool', '192.168.0.15'): 1,
('ETERNALBLUE tool', '192.168.0.12'): 1})
我试图通过使用第 4 列(警报)在第二列(源 IP)中获得最高出现率。
示例列表:
test=[["2019-01-05 03:15:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:25:49,"192.168.0.15","192.168.0.1:4070","network discover"],
["2019-01-05 03:35:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:55:49,"192.168.0.12","192.168.0.1:4070","network discover"],
["2019-01-05 04:38:13","192.168.0.15","192.168.0.41:445","ETERNALBLUE tool"],
["2019-01-05 05:28:13","192.168.0.12","192.168.0.39:445","ETERNALBLUE tool"]]
期望输出
network discover, 192.168.0.15 = 3
network discovery, 192.168.0.12 = 1
ETERNALBLUE tool, 192.168.0.15 = 1
ETERNALBLUE tool, 192.168.0.12 = 1
使用collections.defaultdict
例如:
from collections import defaultdict
test=[["2019-01-05 03:15:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:25:49","192.168.0.15","192.168.0.1:4070","network discover"],
["2019-01-05 03:35:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:55:49","192.168.0.12","192.168.0.1:4070","network discover"],
["2019-01-05 04:38:13","192.168.0.15","192.168.0.41:445","ETERNALBLUE tool"],
["2019-01-05 05:28:13","192.168.0.12","192.168.0.39:445","ETERNALBLUE tool"]]
result = defaultdict(int)
for i in test:
result[(i[-1], i[1])] += 1
print(result)
输出:
defaultdict(<type 'int'>, {
('network discover', '192.168.0.12'): 1,
('ETERNALBLUE tool', '192.168.0.15'): 1,
('ETERNALBLUE tool', '192.168.0.12'): 1,
('network discover', '192.168.0.15'): 3
})
您可以使用 Counter
:
from collections import Counter
from pprint import pprint
c = Counter((i[-1], i[1]) for i in test)
pprint(c)
输出:
Counter({('network discover', '192.168.0.15'): 3,
('network discover', '192.168.0.12'): 1,
('ETERNALBLUE tool', '192.168.0.15'): 1,
('ETERNALBLUE tool', '192.168.0.12'): 1})