使用awk分析日志文件
analyze log files using awk
你好,我在下面截取了一个日志文件:
Mon, 22 Mar 2020 13:15:39 +0200|185.34.66.225|user_1| - |user logged in| -
Mon, 22 Mar 2020 13:15:39 +0200|185.34.66.225|user_1| - |user changed password| -
Mon, 22 Mar 2020 13:15:39 +0200|185.34.66.225|user_1| - |user logged off| -
Mon, 22 Mar 2020 13:15:42 +0200|185.34.66.225|user_2| - |user logged in| -
Mon, 22 Mar 2020 13:15:40 +0200|185.34.66.215|user_3| - |user logged in| -
Mon, 22 Mar 2020 13:15:49 +0200|185.34.66.215|user_3| - |user changed password| -
Mon, 22 Mar 2020 13:15:49 +0200|185.34.66.215|user_3| - |user logged off| -
Mon, 22 Mar 2020 13:15:59 +0200|185.34.66.205|user_4| - |user logged in| -
Mon, 22 Mar 2020 13:15:59 +0200|185.34.66.205|user_4| - |user logged in| -
Mon, 22 Mar 2020 13:15:59 +0200|185.34.66.205|user_4| - |user changed password| -
Mon, 22 Mar 2020 13:15:59 +0200|185.34.66.205|user_4| - |user logged off| -
Mon, 22 Mar 2020 13:17:50 +0200|185.34.66.205|user_5| - |user logged in| -
Mon, 22 Mar 2020 13:17:50 +0200|185.34.66.205|user_5| - |user changed password| -
Mon, 22 Mar 2020 13:17:50 +0200|185.34.66.205|user_5| - |user changed profile| -
Mon, 22 Mar 2020 13:17:50 +0200|185.34.66.205|user_5| - |user logged off| -
Mon, 22 Mar 2020 15:19:19 +0200|178.56.66.225|user_6| - |user logged in| -
Mon, 22 Mar 2020 15:19:19 +0200|178.56.66.225|user_6| - |user changed password| -
Mon, 22 Mar 2020 15:19:19 +0200|178.56.66.225|user_6| - |user logged off| -
Mon, 22 Mar 2020 13:20:42 +0200|185.34.67.225|user_7| - |user logged in| -
主要思想是获取在同一秒内登录、更改密码、注销并且在这 3 个操作之间不执行任何其他操作的机器人列表:
我能够使用以下命令实现我想要的:
cat /path/to/file | awk '{split([=12=],a,"|"); print a[3],a[1],a[5]}' | awk '{ print ,,,, }' | grep -A 1 -B 1 "user changed password" | awk 'seen[]++ ==2' | grep "user logged off" | awk '{ print }'
输出:
user_1
user_4
user_6
但是我需要专家帮助来缩短我的代码并使其在巨大的日志文件中尽可能快地运行
任何帮助将不胜感激
一次 awk
电话完成所有事情。
awk -F'|' '
BEGIN {
a[0]="user logged in"
a[1]="user changed password"
a[2]="user logged off"
}
lastuser!= || lasttime!= || a[expected]!= {
lasttime=
lastuser=
expected=(a[0]==?1:0)
next
}
expected++==2 {
print
}' path_to_file
对于你的场景,我认为这会很有效
awk -F\| '{ vtAll[";"]++; if( ~ /user logged in|user logged off|user changed password/) vt[";"]++; } END { for (i in vt) if(vt[i] == 3 && vtAll[i] == 3) print i }' inputFile
分享我的逻辑:
- 我创建了两个数组,以时间和用户为索引
- 在 vtAll 中,我保存了用户在那个确切时间进行了多少次操作
- 在 vt 中,我检查操作是登录、注销还是更改通过。如果是这样,我也增加它
- 读取整个文件后,我检查两个数组上是否有三个动作。如果有,则表示该用户同时登录、更改密码和注销,并且该用户没有做任何其他事情。
你好,我在下面截取了一个日志文件:
Mon, 22 Mar 2020 13:15:39 +0200|185.34.66.225|user_1| - |user logged in| -
Mon, 22 Mar 2020 13:15:39 +0200|185.34.66.225|user_1| - |user changed password| -
Mon, 22 Mar 2020 13:15:39 +0200|185.34.66.225|user_1| - |user logged off| -
Mon, 22 Mar 2020 13:15:42 +0200|185.34.66.225|user_2| - |user logged in| -
Mon, 22 Mar 2020 13:15:40 +0200|185.34.66.215|user_3| - |user logged in| -
Mon, 22 Mar 2020 13:15:49 +0200|185.34.66.215|user_3| - |user changed password| -
Mon, 22 Mar 2020 13:15:49 +0200|185.34.66.215|user_3| - |user logged off| -
Mon, 22 Mar 2020 13:15:59 +0200|185.34.66.205|user_4| - |user logged in| -
Mon, 22 Mar 2020 13:15:59 +0200|185.34.66.205|user_4| - |user logged in| -
Mon, 22 Mar 2020 13:15:59 +0200|185.34.66.205|user_4| - |user changed password| -
Mon, 22 Mar 2020 13:15:59 +0200|185.34.66.205|user_4| - |user logged off| -
Mon, 22 Mar 2020 13:17:50 +0200|185.34.66.205|user_5| - |user logged in| -
Mon, 22 Mar 2020 13:17:50 +0200|185.34.66.205|user_5| - |user changed password| -
Mon, 22 Mar 2020 13:17:50 +0200|185.34.66.205|user_5| - |user changed profile| -
Mon, 22 Mar 2020 13:17:50 +0200|185.34.66.205|user_5| - |user logged off| -
Mon, 22 Mar 2020 15:19:19 +0200|178.56.66.225|user_6| - |user logged in| -
Mon, 22 Mar 2020 15:19:19 +0200|178.56.66.225|user_6| - |user changed password| -
Mon, 22 Mar 2020 15:19:19 +0200|178.56.66.225|user_6| - |user logged off| -
Mon, 22 Mar 2020 13:20:42 +0200|185.34.67.225|user_7| - |user logged in| -
主要思想是获取在同一秒内登录、更改密码、注销并且在这 3 个操作之间不执行任何其他操作的机器人列表: 我能够使用以下命令实现我想要的:
cat /path/to/file | awk '{split([=12=],a,"|"); print a[3],a[1],a[5]}' | awk '{ print ,,,, }' | grep -A 1 -B 1 "user changed password" | awk 'seen[]++ ==2' | grep "user logged off" | awk '{ print }'
输出:
user_1
user_4
user_6
但是我需要专家帮助来缩短我的代码并使其在巨大的日志文件中尽可能快地运行
任何帮助将不胜感激
一次 awk
电话完成所有事情。
awk -F'|' '
BEGIN {
a[0]="user logged in"
a[1]="user changed password"
a[2]="user logged off"
}
lastuser!= || lasttime!= || a[expected]!= {
lasttime=
lastuser=
expected=(a[0]==?1:0)
next
}
expected++==2 {
print
}' path_to_file
对于你的场景,我认为这会很有效
awk -F\| '{ vtAll[";"]++; if( ~ /user logged in|user logged off|user changed password/) vt[";"]++; } END { for (i in vt) if(vt[i] == 3 && vtAll[i] == 3) print i }' inputFile
分享我的逻辑:
- 我创建了两个数组,以时间和用户为索引
- 在 vtAll 中,我保存了用户在那个确切时间进行了多少次操作
- 在 vt 中,我检查操作是登录、注销还是更改通过。如果是这样,我也增加它
- 读取整个文件后,我检查两个数组上是否有三个动作。如果有,则表示该用户同时登录、更改密码和注销,并且该用户没有做任何其他事情。