按天和分钟计算来自 access.log 的唯一 Ips
Count unique Ips from access.log by day and minute
我想知道是否可以在 Ubuntu 的特定日期 (Apache access.log) 按分钟计算唯一 IP。
我已经找到了这个有用的请求,它给出了每个 day/minute 的请求。但不幸的是,我没有让它计算 ips 而不是请求行:
grep "06/Sep/2021" access.log | cut -d[ -f2 | cut -d] -f1 |
awk -F: '{print ":"}' | sort -nk1 -nk2 | uniq -c
我的尝试不是很好:
grep "06/Sep/2021" access.log | awk '{print
substr(,14,5)}' | sort | uniq | while read p; do count=`grep $p
access.log | awk '{print }' | sort | uniq | wc
-l` echo $count $p done
Apache Access.log:
11.111.111.111 - - [06/Sep/2021:01:51:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:52:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:53:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:54:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:55:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:56:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:57:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:58:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.112 - - [06/Sep/2021:01:58:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:59:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:01:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:02:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:03:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:04:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:05:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:06:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:07:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:08:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:09:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.112 - - [06/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
预期输出:
1 01:51
1 01:52
1 01:53
1 01:54
1 01:55
1 01:56
1 01:57
2 01:58
1 01:59
1 02:01
1 02:02
1 02:03
1 02:04
1 02:05
1 02:06
1 02:07
1 02:08
1 02:09
2 02:10
假设:
- ip地址无所谓;虽然在标题中提到了 'unique ip' 并且在问题中提到了 body,但预期的输出没有提到 ip 地址,并且预期的输出计数似乎没有被 ip
分隔
添加几行不同的日期:
$ cat access.log
11.111.111.111 - - [03/Sep/2021:01:51:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [01/Sep/2021:01:52:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
... all of the lines from OP's sample input ...
11.111.111.112 - - [07/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.112 - - [10/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
一个 awk
取代所有 grep/awk/while/cut/sort/uniq
编码的想法(并且启动速度更快):
awk -v dt='06/Sep/2021' '
[=11=] ~ dt { split([=11=],timestamp,"[][]")
split(timestamp[2],hrmin,":")
count[hrmin[2]":"hrmin[3]]++
}
END { for (i in count)
print count[i],i
}
' access.log | sort -k2V
这会生成:
1 01:51
1 01:52
1 01:53
1 01:54
1 01:55
1 01:56
1 01:57
2 01:58 # ip addresses 11.111.111.11{1,2}
1 01:59
1 02:01
1 02:02
1 02:03
1 02:04
1 02:05
1 02:06
1 02:07
1 02:08
1 02:09
2 02:10 # ip addresses 11.111.111.11{1,2}
注意: 如果使用 GNU awk
可以删除 | sort -k2V
并对 awk/END
块进行以下更改:
END { PROCINFO["sorted_in"]="@ind_str_asc"
for (i in count)
print count[i],i
}
这里有一个 gnu-awk
解决方案,可以在单个命令中执行此操作:
awk -v dt="06/Sep/2021" '
[=10=] ~ dt && gsub(/^[^:]+:|:[0-9]+$/, "", ) { ++fq[] }
END {
PROCINFO["sorted_in"]="@ind_str_asc"
for (i in fq)
print i, fq[i]
}' file.log
01:51 1
01:52 1
01:53 1
01:54 1
01:55 1
01:56 1
01:57 1
01:58 2
01:59 1
02:01 1
02:02 1
02:03 1
02:04 1
02:05 1
02:06 1
02:07 1
02:08 1
02:09 1
02:10 2
PROCINFO["sorted_in"]="@ind_str_asc"
已用于按字符串升序对键进行排序。
使用您展示的示例,请尝试执行以下 awk
程序。
awk -v dt="06/Sep/2021" '
[=10=] ~ dt && match([=10=],/\[[^ ]*/){
arr[substr([=10=],RSTART+13,RLENGTH-16)]++
}
END{
for(key in arr){
print key,arr[key]
}
}
' Input_file | sort -k1
解释: 使用 awk
程序并从中解析 Input_file。制作名为 dt
的 awk
变量,其值为 06/Sep/2021
。在主程序中检查行是否包含 dt 变量并使用 match
函数匹配从 [
到 space 的正则表达式(这基本上会得到 [06/Sep/2021:02:01:43
)。创建 arr 数组,其中索引作为匹配的正则表达式值。在 awk
的 END
块中,程序遍历 arr
的元素并打印键及其值。将其输出发送到排序以获得排序形式的输出。
我想知道是否可以在 Ubuntu 的特定日期 (Apache access.log) 按分钟计算唯一 IP。
我已经找到了这个有用的请求,它给出了每个 day/minute 的请求。但不幸的是,我没有让它计算 ips 而不是请求行:
grep "06/Sep/2021" access.log | cut -d[ -f2 | cut -d] -f1 |
awk -F: '{print ":"}' | sort -nk1 -nk2 | uniq -c
我的尝试不是很好:
grep "06/Sep/2021" access.log | awk '{print
substr(,14,5)}' | sort | uniq | while read p; do count=`grep $p
access.log | awk '{print }' | sort | uniq | wc
-l` echo $count $p done
Apache Access.log:
11.111.111.111 - - [06/Sep/2021:01:51:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:52:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:53:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:54:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:55:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:56:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:57:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:58:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.112 - - [06/Sep/2021:01:58:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:01:59:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:01:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:02:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:03:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:04:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:05:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:06:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:07:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:08:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:09:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [06/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.112 - - [06/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
预期输出:
1 01:51
1 01:52
1 01:53
1 01:54
1 01:55
1 01:56
1 01:57
2 01:58
1 01:59
1 02:01
1 02:02
1 02:03
1 02:04
1 02:05
1 02:06
1 02:07
1 02:08
1 02:09
2 02:10
假设:
- ip地址无所谓;虽然在标题中提到了 'unique ip' 并且在问题中提到了 body,但预期的输出没有提到 ip 地址,并且预期的输出计数似乎没有被 ip 分隔
添加几行不同的日期:
$ cat access.log
11.111.111.111 - - [03/Sep/2021:01:51:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [01/Sep/2021:01:52:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
... all of the lines from OP's sample input ...
11.111.111.112 - - [07/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.112 - - [10/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
一个 awk
取代所有 grep/awk/while/cut/sort/uniq
编码的想法(并且启动速度更快):
awk -v dt='06/Sep/2021' '
[=11=] ~ dt { split([=11=],timestamp,"[][]")
split(timestamp[2],hrmin,":")
count[hrmin[2]":"hrmin[3]]++
}
END { for (i in count)
print count[i],i
}
' access.log | sort -k2V
这会生成:
1 01:51
1 01:52
1 01:53
1 01:54
1 01:55
1 01:56
1 01:57
2 01:58 # ip addresses 11.111.111.11{1,2}
1 01:59
1 02:01
1 02:02
1 02:03
1 02:04
1 02:05
1 02:06
1 02:07
1 02:08
1 02:09
2 02:10 # ip addresses 11.111.111.11{1,2}
注意: 如果使用 GNU awk
可以删除 | sort -k2V
并对 awk/END
块进行以下更改:
END { PROCINFO["sorted_in"]="@ind_str_asc"
for (i in count)
print count[i],i
}
这里有一个 gnu-awk
解决方案,可以在单个命令中执行此操作:
awk -v dt="06/Sep/2021" '
[=10=] ~ dt && gsub(/^[^:]+:|:[0-9]+$/, "", ) { ++fq[] }
END {
PROCINFO["sorted_in"]="@ind_str_asc"
for (i in fq)
print i, fq[i]
}' file.log
01:51 1
01:52 1
01:53 1
01:54 1
01:55 1
01:56 1
01:57 1
01:58 2
01:59 1
02:01 1
02:02 1
02:03 1
02:04 1
02:05 1
02:06 1
02:07 1
02:08 1
02:09 1
02:10 2
PROCINFO["sorted_in"]="@ind_str_asc"
已用于按字符串升序对键进行排序。
使用您展示的示例,请尝试执行以下 awk
程序。
awk -v dt="06/Sep/2021" '
[=10=] ~ dt && match([=10=],/\[[^ ]*/){
arr[substr([=10=],RSTART+13,RLENGTH-16)]++
}
END{
for(key in arr){
print key,arr[key]
}
}
' Input_file | sort -k1
解释: 使用 awk
程序并从中解析 Input_file。制作名为 dt
的 awk
变量,其值为 06/Sep/2021
。在主程序中检查行是否包含 dt 变量并使用 match
函数匹配从 [
到 space 的正则表达式(这基本上会得到 [06/Sep/2021:02:01:43
)。创建 arr 数组,其中索引作为匹配的正则表达式值。在 awk
的 END
块中,程序遍历 arr
的元素并打印键及其值。将其输出发送到排序以获得排序形式的输出。