来自终端的 Sed 正则表达式字符串替换

Question

我有一个标准格式的日志文件，例如：

31 Mar - Lorem Ipsom1
31 Mar - Lorem Ipsom2
31 Mar - Lorem Ipsom3

我要实现的替换是 31*31 到 31，所以我将得到一个只有最后一行的日志，在这个例子中它看起来像：

31 Mar - Lorem Ipsom3

我希望在没有 perl 的定制 linux 机器上执行它。我试过像这样使用 sed：

sed -i -- 's/31*31/31/g' /var/log/prog/logFile

但它什么也没做.. 也欢迎任何涉及 ninja bash 命令的替代方法。

Answer 1

* 不像在 shell 中那样是通配符，它是一个量词。您需要对 .（任何字符）进行量化。因此正则表达式是：

sed ':a;N;$!ba;s/31.*31/31/g'

（我删除了 -i 标志，以便您可以先安全地测试您的文件）。

:a;N;$!ba; 部分可以处理超过新行。

但请注意：

正则表达式将匹配任何 31 所以：

31 Mar - Lorem Ipsom1
31 Mar - Lorem 31 Ipsom2

会导致

31 Ipsom2

会贪心匹配，如果log是这样写的：

31 Mar - Lorem Ipsom1
30 Mar - Lorem Ipsom2
31 Mar - Lorem Ipsom3

它删除第二行。

你可以这样写来解决第一个问题：

sed ':a;N;$!ba;s/(^|\n)31.*\n31/31/g'

这会强制正则表达式第二个 31 位于行的开头。

Answer 2

一种仅保留与模式匹配的连续行的最后一行的方法是

sed -n '/^31/ { :a $!{ h; n; //ba; x; G } }; p' filename

其工作原理如下：

/^31/ {    # if a line begins with 31
  :a       # jump label for looping

  $!{      # if the end of input has not been reached (otherwise the current
           # line is the last line of the block by virtue of being the last
           # line)

    h      # hold the current line
    n      # fetch the next line. (note that this doesn't print the line
           # because of -n)

    //ba   # if that line also begins with 31, go to :a. // attempts the
           # most recently attempted regex again, which was ^31

    x      # swap hold buffer, pattern space
    G      # append hold buffer to pattern space. The PS now contains
           # the last line of the block followed by the first line that 
           # comes after it
  }
}
p          # in the end, print the result

这避免了多行正则表达式的一些问题，例如匹配在一行中间开始或结束。它也不会丢弃两个匹配行块之间的行并保留每个块的最后一行。

Answer 3

我想您可能正在寻找 "tail" 来获取文件的最后一行例如

tail -1 /path/file

或者如果您想要每天的最后一个条目，那么 "sort" 可能是您的解决方案

sort -ur -k 1,2 /path/file | sort

-u 标志指定仅返回一个匹配的关键字段
-k 1,2 指定关键字段是前两个字段 - 在这种情况下它们是月份和日期 - 默认情况下字段由白色分隔 space.
-r 标志反转行，以便返回每个日期的最后一个匹配项。第二次排序恢复原顺序

如果您的日志文件有超过一个月的数据，并且您希望保留顺序（例如，如果您在同一个文件中有 3 月 31 日和 4 月 1 日），您可以尝试：

cat -n tmp2 | sort -nr | sort -u -k 2,3 | sort -n | cut -f 2-

cat -n 在排序前将行号添加到日志文件中。
sort 和以前一样，但使用字段 2 和 3，因为字段 1 现在是原始行号
sort按原行号恢复原顺序
使用cut去掉行号，恢复原来的行内容。

例如

 $ cat tmp2
 30 Mar - Lorem Ipsom2
 30 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom2
 31 Mar - Lorem Ipsom3
 1 Apr - Lorem Ipsom1
 1 Apr - Lorem Ipsom2

 $ cat -n tmp2 | sort -r | sort -u -k 2,3 | sort | cut -f 2-
 30 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom3
 1 Apr - Lorem Ipsom2

来自终端的 Sed 正则表达式字符串替换

Sed regex string substitution from terminal

regex

linux

bash

sed