如何改进我的 sed 命令以从 ping 日志文件中提取数据?

how can i improve my sed command to extract data form ping log file?

关注细节,该网站要求我包含一些文本,因为大部分是代码,所以我输入了这句话,但我认为它是不言自明的

示例日志文件:

jue 08 abr 2021 13:33:49 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 40.462/50.166/62.318 ms
jue 08 abr 2021 13:35:35 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 99 packets received, 1% packet loss
round-trip min/avg/max = 42.055/48.856/136.962 ms
jue 08 abr 2021 13:37:21 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 40.058/47.762/64.169 ms

到目前为止我的命令:

cat sample.log | sed -r -e '/^... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}/{s/(... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}).*$//g;n;d}' -e '/^--- google.*$/d' -e 's/100 packets transmitted.*([0-9]+%) packet.*$//' -e '/round-trip/d'

获得的结果:

jue 08 abr 2021 13:33
0%
jue 08 abr 2021 13:35
1%
jue 08 abr 2021 13:37
0%

想要的理想结果:

jue 08 abr 2021 13:33, 0%
jue 08 abr 2021 13:35, 1%
jue 08 abr 2021 13:37, 0%

第一个解决方案:这应该是awk的任务。使用您显示的示例,请尝试遵循 awk 代码。

awk -v OFS=", " '
match([=10=],/^[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} ([0-9]{2}:){2}[0-9]{2}/){
  val=substr([=10=],RSTART,RLENGTH-3)
  next
}
/packets transmitted/{
  print val,$(NF-2)
  val=""
}
'  Input_file

解释: 简单的解释是,使用 match 函数,其中提到正则表达式来匹配 ^[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} ([0-9]{2}:){2}[0-9]{2}(进一步解释正则表达式),如果找到匹配项,则创建 val 变量,该变量的值由正则表达式匹配(捕获值)。使用 next 将跳过此处的所有其他语句。然后检查条件,如果行包含 packets transmitted 然后打印 val 以及该行的倒数第三个字段。然后使 val 变量无效。

正则表达式解释:

^[a-zA-Z]+               ##Matching small/capital letters 1 or more occurrences from starting.
 [0-9]{2}                ##Matching space followed by 2 occurrences of digits.
 [a-zA-Z]+               ##Matching space followed by 2 occurrences of small/capital letters.
 [0-9]{4}                ##Matching space followed by followed by 4 digits.
 ([0-9]{2}:){2}[0-9]{2}  ##Matching space followed by digits 2 occurrences followed by colon and this whole group should occur 2 times followed by 2 occurrences of digits.


第二个解决方案: 使用 GNU awk 在这里我们可以在 RS 变量中使用几乎相同的正则表达式并且可以获得所需的结果如下:

awk -v RS='[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} [0-9]{2}:[0-9]{2}|[0-9]{1,3}%' -v OFS=", " '
RT{
  val=(val?val (++count%2==0?ORS:OFS):"") RT
}
END{
  print val
}
'  Input_file

要获得所需的格式,您可以将输出通过管道传输到:

sed 'N;s/\n/, /'

最终命令变为(请注意,您不需要 cat 到 sed,因为它接受文件名作为参数):

sed -r -e '/^... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}/{s/(... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}).*$//g;n;d}' -e '/^--- google.*$/d' -e 's/100 packets transmitted.*([0-9]+%) packet.*$//' -e '/round-trip/d'  sample.log | sed 'N;s/\n/, /'

输出:

jue 08 abr 2021 13:33, 0%
jue 08 abr 2021 13:35, 1%
jue 08 abr 2021 13:37, 0%

让我们假设:

  • 数据包丢失百分比总是在以 NUM% packet loss 结尾的行中找到。
  • 日期和时间总是在以 data bytes 结尾的行中找到。

然后,使用 GNU sed(在您显示的两条完整记录上进行测试):

$ sed -nE '/packet loss$/{s/.*\s([0-9]+%) packet loss$//;h}
  /data bytes$/{s/(.{24}).*//;G;s/\n/, /;p}' sample.log
jue 08 abr 2021 13:35:35, 0%
jue 08 abr 2021 13:37:21, 1%