sed(awk?)删除几乎重复的行
sed (awk?) to remove nearly duplicate lines
我有一个文件,该文件将 HTML 风格的评论与其真实文本交替显示:
<!-- Here's a first line -->
Here's a first line
<!-- Here's a second line -->
Here's a third line
如果评论除标签本身外与以下行相同,我想将其删除,否则保留:
Here's a first line
<!-- Here's a second line -->
Here's a third line
我在这里读过类似的问题,但无法推断出适合我的情况的解决方案。
你可以使用这个awk
:
awk '/<!--.*?-->/{h=[=10=]; gsub(/ *(<!--|-->) */, ""); s=[=10=]; next}
[=10=]!=s{[=10=]=h ORS [=10=]} 1' file.html
Here's a first line
<!-- Here's a second line -->
Here's a third line
sed '/^<!-- \(.*\) -->$/N;s/^<!-- \(.*\) -->\n$//'
#
# /^<!-- \(.*\) -->$/ match an HTML comment as its own line, in which case
# N; add the next line to the pattern space and keep going
#
# s/^<!-- \(.*\) -->\n$/ detect a comment as you
# / described and replace it
# appropriately
如图:
$ sed '/^<!-- \(.*\) -->$/N;s/^<!-- \(.*\) -->\n$//' <<EOF
> <!-- Foo -->
> Foo
> <!-- Bar -->
> Baz
> <!-- Quux -->
> Quux
>
> Something
> Something
> Another something
> EOF
给出:
Foo
<!-- Bar -->
Baz
Quux
Something
Something
Another something
您可能需要对此进行调整以处理缩进,但这不足为奇。您可能还想切换到 sed -r
,这将要求不对括号进行转义。
这可能适合您 (GNU sed):
sed -r '$!N;/<!-- (.*) -->\n$/!P;D' file
这会比较整个文件中所有连续的行是否符合请求的条件,如果找到则不打印该对的第一行。
N.B。这迎合了连续的评论行
我有一个文件,该文件将 HTML 风格的评论与其真实文本交替显示:
<!-- Here's a first line -->
Here's a first line
<!-- Here's a second line -->
Here's a third line
如果评论除标签本身外与以下行相同,我想将其删除,否则保留:
Here's a first line
<!-- Here's a second line -->
Here's a third line
我在这里读过类似的问题,但无法推断出适合我的情况的解决方案。
你可以使用这个awk
:
awk '/<!--.*?-->/{h=[=10=]; gsub(/ *(<!--|-->) */, ""); s=[=10=]; next}
[=10=]!=s{[=10=]=h ORS [=10=]} 1' file.html
Here's a first line
<!-- Here's a second line -->
Here's a third line
sed '/^<!-- \(.*\) -->$/N;s/^<!-- \(.*\) -->\n$//'
#
# /^<!-- \(.*\) -->$/ match an HTML comment as its own line, in which case
# N; add the next line to the pattern space and keep going
#
# s/^<!-- \(.*\) -->\n$/ detect a comment as you
# / described and replace it
# appropriately
如图:
$ sed '/^<!-- \(.*\) -->$/N;s/^<!-- \(.*\) -->\n$//' <<EOF
> <!-- Foo -->
> Foo
> <!-- Bar -->
> Baz
> <!-- Quux -->
> Quux
>
> Something
> Something
> Another something
> EOF
给出:
Foo
<!-- Bar -->
Baz
Quux
Something
Something
Another something
您可能需要对此进行调整以处理缩进,但这不足为奇。您可能还想切换到 sed -r
,这将要求不对括号进行转义。
这可能适合您 (GNU sed):
sed -r '$!N;/<!-- (.*) -->\n$/!P;D' file
这会比较整个文件中所有连续的行是否符合请求的条件,如果找到则不打印该对的第一行。
N.B。这迎合了连续的评论行