按天和分钟计算来自 access.log 的唯一 Ips

Count unique Ips from access.log by day and minute

我想知道是否可以在 Ubuntu 的特定日期 (Apache access.log) 按分钟计算唯一 IP。

我已经找到了这个有用的请求,它给出了每个 day/minute 的请求。但不幸的是,我没有让它计算 ips 而不是请求行:

grep "06/Sep/2021" access.log | cut -d[ -f2 | cut -d] -f1 |
awk -F: '{print ":"}' | sort -nk1 -nk2 | uniq -c

我的尝试不是很好:

grep "06/Sep/2021" access.log | awk '{print
substr(,14,5)}' | sort | uniq | while read p; do   count=`grep $p
access.log | awk '{print }' | sort | uniq | wc
-l`   echo $count $p  done

Apache Access.log:

11.111.111.111 - - [06/Sep/2021:01:51:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:52:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:53:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:54:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:55:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:56:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:57:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:58:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.112 - - [06/Sep/2021:01:58:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:01:59:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:01:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:02:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:03:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:04:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:05:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:06:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:07:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:08:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:09:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.111 - - [06/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
    11.111.111.112 - - [06/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146

预期输出:

1 01:51
1 01:52
1 01:53
1 01:54
1 01:55
1 01:56
1 01:57
2 01:58
1 01:59
1 02:01
1 02:02
1 02:03
1 02:04
1 02:05
1 02:06
1 02:07
1 02:08
1 02:09
2 02:10

假设:

  • ip地址无所谓;虽然在标题中提到了 'unique ip' 并且在问题中提到了 body,但预期的输出没有提到 ip 地址,并且预期的输出计数似乎没有被 ip
  • 分隔

添加几行不同的日期:

$ cat access.log
11.111.111.111 - - [03/Sep/2021:01:51:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.111 - - [01/Sep/2021:01:52:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
... all of the lines from OP's sample input ...
11.111.111.112 - - [07/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146
11.111.111.112 - - [10/Sep/2021:02:10:43 +0200] "GET / HTTP/1.1" 200 30783 "https://website.de/" "Mozilla/5.0 (compatible; website; +https://www.website.de/robot.html)" 2584 32146

一个 awk 取代所有 grep/awk/while/cut/sort/uniq 编码的想法(并且启动速度更快):

awk -v dt='06/Sep/2021' '
[=11=] ~ dt { split([=11=],timestamp,"[][]")
          split(timestamp[2],hrmin,":")
          count[hrmin[2]":"hrmin[3]]++
        }
END     { for (i in count) 
              print count[i],i
        }
' access.log | sort -k2V

这会生成:

1 01:51
1 01:52
1 01:53
1 01:54
1 01:55
1 01:56
1 01:57
2 01:58    # ip addresses 11.111.111.11{1,2}
1 01:59
1 02:01
1 02:02
1 02:03
1 02:04
1 02:05
1 02:06
1 02:07
1 02:08
1 02:09
2 02:10    # ip addresses 11.111.111.11{1,2}

注意: 如果使用 GNU awk 可以删除 | sort -k2V 并对 awk/END 块进行以下更改:

END     { PROCINFO["sorted_in"]="@ind_str_asc"
          for (i in count) 
              print count[i],i
        }

这里有一个 gnu-awk 解决方案,可以在单个命令中执行此操作:

awk -v dt="06/Sep/2021" '
[=10=] ~ dt && gsub(/^[^:]+:|:[0-9]+$/, "", ) { ++fq[] }
END {
   PROCINFO["sorted_in"]="@ind_str_asc"
   for (i in fq)
      print i, fq[i]
}' file.log

01:51 1
01:52 1
01:53 1
01:54 1
01:55 1
01:56 1
01:57 1
01:58 2
01:59 1
02:01 1
02:02 1
02:03 1
02:04 1
02:05 1
02:06 1
02:07 1
02:08 1
02:09 1
02:10 2

PROCINFO["sorted_in"]="@ind_str_asc" 已用于按字符串升序对键进行排序。

使用您展示的示例,请尝试执行以下 awk 程序。

awk -v dt="06/Sep/2021" '
[=10=] ~ dt && match([=10=],/\[[^ ]*/){
  arr[substr([=10=],RSTART+13,RLENGTH-16)]++
}
END{
  for(key in arr){
    print key,arr[key]
  }
}
'  Input_file | sort -k1

解释: 使用 awk 程序并从中解析 Input_file。制作名为 dtawk 变量,其值为 06/Sep/2021。在主程序中检查行是否包含 dt 变量并使用 match 函数匹配从 [ 到 space 的正则表达式(这基本上会得到 [06/Sep/2021:02:01:43)。创建 arr 数组,其中索引作为匹配的正则表达式值。在 awkEND 块中,程序遍历 arr 的元素并打印键及其值。将其输出发送到排序以获得排序形式的输出。