在出现这么多空格后从一行中删除空格
Removing spaces out of a line after so many spaces has occured
所以我试图通过将其加载到 mysql 来进行一些 tomcat 访问日志分析。我的大部分工作正常,但组合访问日志中的最后一个条目有点麻烦,它并不总是具有相同的 spaces,并且文件是 space 分隔的。我需要文件中的最后一个字符串来删除 space 或用逗号或其他占位符替换。
我通过 sed 处理文件以从文件中删除所有 ",所以如果我可以在我的 sed 命令中添加更多来执行此操作,那就太好了,如果我需要 运行 它反对sed 命令之后的其他内容将起作用。
这里是sed命令之前的文件
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
这是 sed 命令
sed 's/\"//g' filename > newfilename
这是文件中的一个示例字符串,在该命令被 运行 反对后。由于它在 mysql 中被 space 分隔,因此它试图再创建几列,但它不能。所以,如果我能从最后一节中得到所有 space,那就太棒了。
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
Mozilla 不存在的字符串示例。
24.240.97.38 - - [09/Feb/2015:07:38:21 -0600] GET /irep/images/integra.png HTTP/1.1 304 - - MobileSafari/600.1.4 CFNetwork/711.1.16 Darwin/14.0.0
这是我的预期输出,抱歉今天早上有几个分心的项目。
IPAddress, ClientUsername, AuthUserName, DateTime, Request/File, Protocol, Status, SizeBytes, Referance address, UserAgent/Browser
我会 post mysql workbench 中的 table 的屏幕截图,但我还没有被允许。
基本上从 "Mozilla" 到行尾的所有内容我都希望 space 被替换或消失,我认为逗号或 : 占位符是理想的。有什么建议吗?
Ed,这是我今天 运行 遇到的错误。
awk: irep-istor_access_log.2015-02-10.txt:4: 166.173.58.240 - - [10/Feb/2015:00:04:07 -0600] "GET /istore/js/cart.js HTTP/1.1" 200 7042 "https://istore.salonservicegroup.com/istore/loginpage.jsp" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
awk: irep-istor_access_log.2015-02-10.txt:4: ^ syntax error
您可以像这样完成剩下的部分:
$ awk 'match([=10=],/Mozilla.*/){ tgt=substr([=10=],RSTART); gsub(/[[:space:]]+/,",",tgt); [=10=] = substr([=10=],1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
但是你应该只使用一个小的、简单的 awk 脚本来完成整个事情,不管它是什么。
我看到你刚刚添加了一些预输入(但仍然没有预期的输出)所以:
$ cat file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
$
$ awk '{gsub(/"/,"")} match([=11=],/Mozilla.*/){ tgt=substr([=11=],RSTART); gsub(/[[:space:]]+/,",",tgt); [=11=] = substr([=11=],1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Cart/Controller/TempController.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
不同的方法:以下是将输入文件转换为 CSV 文件的方法:
$ cat tst.awk
BEGIN{
OFS=","
print "ipAddr", "dash1", "dash2", "dateTime", "getCmd", "number", "info", "browser"
}
{
gsub(OFS,";")
ip =
dash1 =
dash2 =
match([=12=],/\[[^]]+\]/)
dt = substr([=12=],RSTART+1,RLENGTH-2)
match([=12=],/"[^"]+"/)
get = substr([=12=],RSTART+1,RLENGTH-2)
[=12=] = substr([=12=],RSTART+RLENGTH)
num =
dash3 =
match([=12=],/"[^"]+"/)
info = substr([=12=],RSTART+1,RLENGTH-2)
[=12=] = substr([=12=],RSTART+RLENGTH)
match([=12=],/"[^"]+"/)
browser = substr([=12=],RSTART+1,RLENGTH-2)
print ip, dash1, dash2, dt, get, num, info, browser
}
.
$ awk -f tst.awk file
ipAddr,dash1,dash2,dateTime,getCmd,number,info,browser
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Cart/Controller/TempController.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
所以我试图通过将其加载到 mysql 来进行一些 tomcat 访问日志分析。我的大部分工作正常,但组合访问日志中的最后一个条目有点麻烦,它并不总是具有相同的 spaces,并且文件是 space 分隔的。我需要文件中的最后一个字符串来删除 space 或用逗号或其他占位符替换。
我通过 sed 处理文件以从文件中删除所有 ",所以如果我可以在我的 sed 命令中添加更多来执行此操作,那就太好了,如果我需要 运行 它反对sed 命令之后的其他内容将起作用。
这里是sed命令之前的文件
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
这是 sed 命令
sed 's/\"//g' filename > newfilename
这是文件中的一个示例字符串,在该命令被 运行 反对后。由于它在 mysql 中被 space 分隔,因此它试图再创建几列,但它不能。所以,如果我能从最后一节中得到所有 space,那就太棒了。
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
Mozilla 不存在的字符串示例。
24.240.97.38 - - [09/Feb/2015:07:38:21 -0600] GET /irep/images/integra.png HTTP/1.1 304 - - MobileSafari/600.1.4 CFNetwork/711.1.16 Darwin/14.0.0
这是我的预期输出,抱歉今天早上有几个分心的项目。
IPAddress, ClientUsername, AuthUserName, DateTime, Request/File, Protocol, Status, SizeBytes, Referance address, UserAgent/Browser
我会 post mysql workbench 中的 table 的屏幕截图,但我还没有被允许。
基本上从 "Mozilla" 到行尾的所有内容我都希望 space 被替换或消失,我认为逗号或 : 占位符是理想的。有什么建议吗?
Ed,这是我今天 运行 遇到的错误。
awk: irep-istor_access_log.2015-02-10.txt:4: 166.173.58.240 - - [10/Feb/2015:00:04:07 -0600] "GET /istore/js/cart.js HTTP/1.1" 200 7042 "https://istore.salonservicegroup.com/istore/loginpage.jsp" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
awk: irep-istor_access_log.2015-02-10.txt:4: ^ syntax error
您可以像这样完成剩下的部分:
$ awk 'match([=10=],/Mozilla.*/){ tgt=substr([=10=],RSTART); gsub(/[[:space:]]+/,",",tgt); [=10=] = substr([=10=],1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
但是你应该只使用一个小的、简单的 awk 脚本来完成整个事情,不管它是什么。
我看到你刚刚添加了一些预输入(但仍然没有预期的输出)所以:
$ cat file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
$
$ awk '{gsub(/"/,"")} match([=11=],/Mozilla.*/){ tgt=substr([=11=],RSTART); gsub(/[[:space:]]+/,",",tgt); [=11=] = substr([=11=],1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Cart/Controller/TempController.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
不同的方法:以下是将输入文件转换为 CSV 文件的方法:
$ cat tst.awk
BEGIN{
OFS=","
print "ipAddr", "dash1", "dash2", "dateTime", "getCmd", "number", "info", "browser"
}
{
gsub(OFS,";")
ip =
dash1 =
dash2 =
match([=12=],/\[[^]]+\]/)
dt = substr([=12=],RSTART+1,RLENGTH-2)
match([=12=],/"[^"]+"/)
get = substr([=12=],RSTART+1,RLENGTH-2)
[=12=] = substr([=12=],RSTART+RLENGTH)
num =
dash3 =
match([=12=],/"[^"]+"/)
info = substr([=12=],RSTART+1,RLENGTH-2)
[=12=] = substr([=12=],RSTART+RLENGTH)
match([=12=],/"[^"]+"/)
browser = substr([=12=],RSTART+1,RLENGTH-2)
print ip, dash1, dash2, dt, get, num, info, browser
}
.
$ awk -f tst.awk file
ipAddr,dash1,dash2,dateTime,getCmd,number,info,browser
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Cart/Controller/TempController.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4