过滤一列并计算另一列中的出现次数

Question

我试图通过使用第 4 列（警报）在第二列（源 IP）中获得最高出现率。

示例列表：

test=[["2019-01-05 03:15:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:25:49,"192.168.0.15","192.168.0.1:4070","network discover"],
["2019-01-05 03:35:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:55:49,"192.168.0.12","192.168.0.1:4070","network discover"],
["2019-01-05 04:38:13","192.168.0.15","192.168.0.41:445","ETERNALBLUE tool"],
["2019-01-05 05:28:13","192.168.0.12","192.168.0.39:445","ETERNALBLUE tool"]]

期望输出

network discover, 192.168.0.15 = 3

network discovery, 192.168.0.12 = 1

ETERNALBLUE tool, 192.168.0.15 = 1

ETERNALBLUE tool, 192.168.0.12 = 1

Answer 1

使用collections.defaultdict

例如：

from collections import defaultdict

test=[["2019-01-05 03:15:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:25:49","192.168.0.15","192.168.0.1:4070","network discover"],
["2019-01-05 03:35:49","192.168.0.15","192.168.0.116:4070","network discover"],
["2019-01-05 03:55:49","192.168.0.12","192.168.0.1:4070","network discover"],
["2019-01-05 04:38:13","192.168.0.15","192.168.0.41:445","ETERNALBLUE tool"],
["2019-01-05 05:28:13","192.168.0.12","192.168.0.39:445","ETERNALBLUE tool"]]

result = defaultdict(int)
for i in test:
    result[(i[-1], i[1])] += 1
print(result)

输出：

defaultdict(<type 'int'>, {
    ('network discover', '192.168.0.12'): 1, 
    ('ETERNALBLUE tool', '192.168.0.15'): 1, 
    ('ETERNALBLUE tool', '192.168.0.12'): 1, 
    ('network discover', '192.168.0.15'): 3
    })

Answer 2

您可以使用 Counter:

from collections import Counter
from pprint import pprint

c = Counter((i[-1], i[1]) for i in test)

pprint(c)

输出：

Counter({('network discover', '192.168.0.15'): 3,
         ('network discover', '192.168.0.12'): 1,
         ('ETERNALBLUE tool', '192.168.0.15'): 1,
         ('ETERNALBLUE tool', '192.168.0.12'): 1})

过滤一列并计算另一列中的出现次数

Filter one column and count the occurence in the other column

python

lambda

counter

for-loop