添加另一列到 awk 输出
Adding another column to awk output
我有一个 HAProxy 日志文件,其内容与此类似:
Feb 28 11:16:10 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:01.220] frontend backend_srvs/srv1 9063/0/0/39/9102 200 694 - - --VN 9984/5492/191/44/0 0/0 {Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36|http://subdomain.domain.com/location1} "GET /location1 HTTP/1.1"
Feb 28 11:16:10 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.322] frontend backend_srvs/srv1 513/0/0/124/637 200 14381 - - --VN 9970/5491/223/55/0 0/0 {Mozilla/5.0 AppleWebKit/537.36 Chrome/56.0.2924.87 Safari/537.36|http://subdomain.domain.com/location2} "GET /location2 HTTP/1.1"
Feb 28 11:16:13 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.960] frontend backend_srvs/srv1 2245/0/0/3/2248 200 7448 - - --VN 9998/5522/263/54/0 0/0 {another user agent with fewer columns|http://subdomain.domain.com/location3} "GET /location3 HTTP/1.1"
Feb 28 11:16:13 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.960] frontend backend_srvs/srv1 2245/0/0/3/2248 200 7448 - - --VN 9998/5522/263/54/0 0/0 {Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36|} "GET /another_location HTTP/1.1"
我想提取一些字段以获得以下输出:
Field 1 Field 2 Field 3 Field 4 Field 5 Field 6
Date/time HTTP status code HTTP Method Request HTTP version Referer URL
基本上,在这种特殊情况下,输出应该是:
Feb 28 11:16:10 200 GET /location1 HTTP/1.1 http://subdomain.domain.com/location1
Feb 28 11:16:10 200 GET /location2 HTTP/1.1 http://subdomain.domain.com/location2
Feb 28 11:16:13 200 GET /location3 HTTP/1.1 http://subdomain.domain.com/location3
Feb 28 11:16:13 200 GET /another_location HTTP/1.1
这里唯一的问题是提取位于 curly 括号之间的 Referer URL 和用户代理,它们由竖线分隔。此外,用户代理具有可变数量的字段。
我能想到的唯一解决方案是分别提取引用 url,然后将这些列粘贴在一起:
requests_temp=`grep -F " 88.88.88.88:" /root/file.log | tr -d '"'`
requests=`echo "${requests_temp}" | awk '{print " "" "" ", $(NF-2), $(NF-1), $NF}' > /tmp/requests_tmp`
referer_url=`echo "${requests_temp}" | awk 'NR > 1 {print }' RS='{' FS='}' | awk -F'|' '{ print }' > /tmp/referer_url_tmp`
paste /tmp/abuse_requests_tmp /tmp/referer_url_tmp
但我不太喜欢这种方法。有没有其他方法可以只使用一条 awk 行来做到这一点?也许将 referer url 列分配给 awk 中的一个变量,然后使用它来创建相同的输出?
您可以使用 awk
一次完成所有操作:
awk ' ~ /88\.88\.88\.88:[0-9]+/{
split([=10=],a,/[{}]/)
[=10=]=a[1] OFS a[3]
split(a[2],b,"|")
print ,,,,substr(,2),,substr(,1,length()-1),b[2]
}' file.log
第一个 split
将行的可变部分(包含在 {...}
之间)拆分为数组 a
.
重新构建该行以具有固定数量的字段[=15=]=a[1] OFS a[3]
第二个 split
允许从基于 |
个字符的变量中提取 URL。
最后 print
显示了所有需要的元素。请注意 substr
在这里用于删除 "
.
尝试以下解决方案 -
awk '/88.88.88.88/ {gsub(/"/,"",[=10=]);split($(NF-3),a,"|"); {print ,,,, $(NF-2), $(NF-1), $NF, substr(a[2],1,(length(a[2])-1))}}' a
Feb 28 11:16:10 200 GET /location1 HTTP/1.1 http://subdomain.domain.com/location1
Feb 28 11:16:10 200 GET /location2 HTTP/1.1 http://subdomain.domain.com/location2
Feb 28 11:16:13 200 GET /location3 HTTP/1.1 http://subdomain.domain.com/location3
Feb 28 11:16:13 200 GET /another_location HTTP/1.1
我有一个 HAProxy 日志文件,其内容与此类似:
Feb 28 11:16:10 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:01.220] frontend backend_srvs/srv1 9063/0/0/39/9102 200 694 - - --VN 9984/5492/191/44/0 0/0 {Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36|http://subdomain.domain.com/location1} "GET /location1 HTTP/1.1"
Feb 28 11:16:10 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.322] frontend backend_srvs/srv1 513/0/0/124/637 200 14381 - - --VN 9970/5491/223/55/0 0/0 {Mozilla/5.0 AppleWebKit/537.36 Chrome/56.0.2924.87 Safari/537.36|http://subdomain.domain.com/location2} "GET /location2 HTTP/1.1"
Feb 28 11:16:13 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.960] frontend backend_srvs/srv1 2245/0/0/3/2248 200 7448 - - --VN 9998/5522/263/54/0 0/0 {another user agent with fewer columns|http://subdomain.domain.com/location3} "GET /location3 HTTP/1.1"
Feb 28 11:16:13 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.960] frontend backend_srvs/srv1 2245/0/0/3/2248 200 7448 - - --VN 9998/5522/263/54/0 0/0 {Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36|} "GET /another_location HTTP/1.1"
我想提取一些字段以获得以下输出:
Field 1 Field 2 Field 3 Field 4 Field 5 Field 6
Date/time HTTP status code HTTP Method Request HTTP version Referer URL
基本上,在这种特殊情况下,输出应该是:
Feb 28 11:16:10 200 GET /location1 HTTP/1.1 http://subdomain.domain.com/location1
Feb 28 11:16:10 200 GET /location2 HTTP/1.1 http://subdomain.domain.com/location2
Feb 28 11:16:13 200 GET /location3 HTTP/1.1 http://subdomain.domain.com/location3
Feb 28 11:16:13 200 GET /another_location HTTP/1.1
这里唯一的问题是提取位于 curly 括号之间的 Referer URL 和用户代理,它们由竖线分隔。此外,用户代理具有可变数量的字段。
我能想到的唯一解决方案是分别提取引用 url,然后将这些列粘贴在一起:
requests_temp=`grep -F " 88.88.88.88:" /root/file.log | tr -d '"'`
requests=`echo "${requests_temp}" | awk '{print " "" "" ", $(NF-2), $(NF-1), $NF}' > /tmp/requests_tmp`
referer_url=`echo "${requests_temp}" | awk 'NR > 1 {print }' RS='{' FS='}' | awk -F'|' '{ print }' > /tmp/referer_url_tmp`
paste /tmp/abuse_requests_tmp /tmp/referer_url_tmp
但我不太喜欢这种方法。有没有其他方法可以只使用一条 awk 行来做到这一点?也许将 referer url 列分配给 awk 中的一个变量,然后使用它来创建相同的输出?
您可以使用 awk
一次完成所有操作:
awk ' ~ /88\.88\.88\.88:[0-9]+/{
split([=10=],a,/[{}]/)
[=10=]=a[1] OFS a[3]
split(a[2],b,"|")
print ,,,,substr(,2),,substr(,1,length()-1),b[2]
}' file.log
第一个 split
将行的可变部分(包含在 {...}
之间)拆分为数组 a
.
重新构建该行以具有固定数量的字段[=15=]=a[1] OFS a[3]
第二个 split
允许从基于 |
个字符的变量中提取 URL。
最后 print
显示了所有需要的元素。请注意 substr
在这里用于删除 "
.
尝试以下解决方案 -
awk '/88.88.88.88/ {gsub(/"/,"",[=10=]);split($(NF-3),a,"|"); {print ,,,, $(NF-2), $(NF-1), $NF, substr(a[2],1,(length(a[2])-1))}}' a
Feb 28 11:16:10 200 GET /location1 HTTP/1.1 http://subdomain.domain.com/location1
Feb 28 11:16:10 200 GET /location2 HTTP/1.1 http://subdomain.domain.com/location2
Feb 28 11:16:13 200 GET /location3 HTTP/1.1 http://subdomain.domain.com/location3
Feb 28 11:16:13 200 GET /another_location HTTP/1.1