在出现这么多空格后从一行中删除空格

Removing spaces out of a line after so many spaces has occured

所以我试图通过将其加载到 mysql 来进行一些 tomcat 访问日志分析。我的大部分工作正常,但组合访问日志中的最后一个条目有点麻烦,它并不总是具有相同的 spaces,并且文件是 space 分隔的。我需要文件中的最后一个字符串来删除 space 或用逗号或其他占位符替换。

我通过 sed 处理文件以从文件中删除所有 ",所以如果我可以在我的 sed 命令中添加更多来执行此操作,那就太好了,如果我需要 运行 它反对sed 命令之后的其他内容将起作用。

这里是sed命令之前的文件

24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"

这是 sed 命令

sed 's/\"//g' filename > newfilename

这是文件中的一个示例字符串,在该命令被 运行 反对后。由于它在 mysql 中被 space 分隔,因此它试图再创建几列,但它不能。所以,如果我能从最后一节中得到所有 space,那就太棒了。

24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4

Mozilla 不存在的字符串示例。

24.240.97.38 - - [09/Feb/2015:07:38:21 -0600] GET /irep/images/integra.png HTTP/1.1 304 - - MobileSafari/600.1.4 CFNetwork/711.1.16 Darwin/14.0.0

这是我的预期输出,抱歉今天早上有几个分心的项目。

IPAddress, ClientUsername, AuthUserName, DateTime, Request/File, Protocol, Status, SizeBytes, Referance address, UserAgent/Browser

我会 post mysql workbench 中的 table 的屏幕截图,但我还没有被允许。

基本上从 "Mozilla" 到行尾的所有内容我都希望 space 被替换或消失,我认为逗号或 : 占位符是理想的。有什么建议吗?

Ed,这是我今天 运行 遇到的错误。

awk: irep-istor_access_log.2015-02-10.txt:4: 166.173.58.240 - - [10/Feb/2015:00:04:07 -0600] "GET /istore/js/cart.js HTTP/1.1" 200 7042 "https://istore.salonservicegroup.com/istore/loginpage.jsp" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
awk: irep-istor_access_log.2015-02-10.txt:4:                                ^ syntax error

您可以像这样完成剩下的部分:

$ awk 'match([=10=],/Mozilla.*/){ tgt=substr([=10=],RSTART); gsub(/[[:space:]]+/,",",tgt); [=10=] = substr([=10=],1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Content/css/jquery.mobile.datebox.css HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/Bookmark.js HTTP/1.1 304 - webaddress Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4

但是你应该只使用一个小的、简单的 awk 脚本来完成整个事情,不管它是什么。

我看到你刚刚添加了一些预输入(但仍然没有预期的输出)所以:

$ cat file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Cart/Controller/TempController.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] "GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1" 304 - "webpage" "Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4"
$              
$ awk '{gsub(/"/,"")} match([=11=],/Mozilla.*/){ tgt=substr([=11=],RSTART); gsub(/[[:space:]]+/,",",tgt); [=11=] = substr([=11=],1,RSTART-1) tgt } 1' file
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Cart/Controller/TempController.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4
24.240.97.38 - - [09/Feb/2015:07:38:23 -0600] GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1 304 - webpage Mozilla/5.0,(iPad;,CPU,OS,8_1_3,like,Mac,OS,X),AppleWebKit/600.1.4,(KHTML,,like,Gecko),Version/8.0,Mobile/12B466,Safari/600.1.4

不同的方法:以下是将输入文件转换为 CSV 文件的方法:

$ cat tst.awk        
BEGIN{
    OFS=","
    print "ipAddr", "dash1", "dash2", "dateTime", "getCmd", "number", "info", "browser"
}
{
    gsub(OFS,";")

    ip = 

    dash1 = 
    dash2 = 

    match([=12=],/\[[^]]+\]/)
    dt = substr([=12=],RSTART+1,RLENGTH-2)

    match([=12=],/"[^"]+"/)
    get = substr([=12=],RSTART+1,RLENGTH-2)
    [=12=] = substr([=12=],RSTART+RLENGTH)

    num = 
    dash3 = 

    match([=12=],/"[^"]+"/)
    info = substr([=12=],RSTART+1,RLENGTH-2)
    [=12=] = substr([=12=],RSTART+RLENGTH)

    match([=12=],/"[^"]+"/)
    browser = substr([=12=],RSTART+1,RLENGTH-2)

    print ip, dash1, dash2, dt, get, num, info, browser
}

.

$ awk -f tst.awk file
ipAddr,dash1,dash2,dateTime,getCmd,number,info,browser
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Cart/Controller/TempController.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4
24.240.97.38,-,-,09/Feb/2015:07:38:23 -0600,GET /irep/client/Libraries/jquery.mobile.datebox.js HTTP/1.1,304,webpage,Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML; like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4