如何改进我的 sed 命令以从 ping 日志文件中提取数据？

Question

关注细节，该网站要求我包含一些文本，因为大部分是代码，所以我输入了这句话，但我认为它是不言自明的

示例日志文件：

jue 08 abr 2021 13:33:49 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 40.462/50.166/62.318 ms
jue 08 abr 2021 13:35:35 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 99 packets received, 1% packet loss
round-trip min/avg/max = 42.055/48.856/136.962 ms
jue 08 abr 2021 13:37:21 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 40.058/47.762/64.169 ms

到目前为止我的命令：

cat sample.log | sed -r -e '/^... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}/{s/(... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}).*$//g;n;d}' -e '/^--- google.*$/d' -e 's/100 packets transmitted.*([0-9]+%) packet.*$//' -e '/round-trip/d'

获得的结果：

jue 08 abr 2021 13:33
0%
jue 08 abr 2021 13:35
1%
jue 08 abr 2021 13:37
0%

想要的理想结果：

jue 08 abr 2021 13:33, 0%
jue 08 abr 2021 13:35, 1%
jue 08 abr 2021 13:37, 0%

Answer 1

第一个解决方案：这应该是awk的任务。使用您显示的示例，请尝试遵循 awk 代码。

awk -v OFS=", " '
match([=10=],/^[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} ([0-9]{2}:){2}[0-9]{2}/){
  val=substr([=10=],RSTART,RLENGTH-3)
  next
}
/packets transmitted/{
  print val,$(NF-2)
  val=""
}
'  Input_file

解释： 简单的解释是，使用 match 函数，其中提到正则表达式来匹配 ^[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} ([0-9]{2}:){2}[0-9]{2}(进一步解释正则表达式），如果找到匹配项，则创建 val 变量，该变量的值由正则表达式匹配（捕获值）。使用 next 将跳过此处的所有其他语句。然后检查条件，如果行包含 packets transmitted 然后打印 val 以及该行的倒数第三个字段。然后使 val 变量无效。

正则表达式解释：

^[a-zA-Z]+               ##Matching small/capital letters 1 or more occurrences from starting.
 [0-9]{2}                ##Matching space followed by 2 occurrences of digits.
 [a-zA-Z]+               ##Matching space followed by 2 occurrences of small/capital letters.
 [0-9]{4}                ##Matching space followed by followed by 4 digits.
 ([0-9]{2}:){2}[0-9]{2}  ##Matching space followed by digits 2 occurrences followed by colon and this whole group should occur 2 times followed by 2 occurrences of digits.

第二个解决方案： 使用 GNU awk 在这里我们可以在 RS 变量中使用几乎相同的正则表达式并且可以获得所需的结果如下：

awk -v RS='[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} [0-9]{2}:[0-9]{2}|[0-9]{1,3}%' -v OFS=", " '
RT{
  val=(val?val (++count%2==0?ORS:OFS):"") RT
}
END{
  print val
}
'  Input_file

Answer 2

要获得所需的格式，您可以将输出通过管道传输到：

sed 'N;s/\n/, /'

最终命令变为（请注意，您不需要 cat 到 sed，因为它接受文件名作为参数）：

sed -r -e '/^... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}/{s/(... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}).*$//g;n;d}' -e '/^--- google.*$/d' -e 's/100 packets transmitted.*([0-9]+%) packet.*$//' -e '/round-trip/d'  sample.log | sed 'N;s/\n/, /'

输出:

jue 08 abr 2021 13:33, 0%
jue 08 abr 2021 13:35, 1%
jue 08 abr 2021 13:37, 0%

Answer 3

让我们假设：

数据包丢失百分比总是在以 NUM% packet loss 结尾的行中找到。
日期和时间总是在以 data bytes 结尾的行中找到。

然后，使用 GNU sed（在您显示的两条完整记录上进行测试）：

$ sed -nE '/packet loss$/{s/.*\s([0-9]+%) packet loss$//;h}
  /data bytes$/{s/(.{24}).*//;G;s/\n/, /;p}' sample.log
jue 08 abr 2021 13:35:35, 0%
jue 08 abr 2021 13:37:21, 1%

如何改进我的 sed 命令以从 ping 日志文件中提取数据？

how can i improve my sed command to extract data form ping log file?

bash

sed