如何使用 awk 打印一系列贪婪的行

Question

我遇到了以下问题，但没有找到解决方案，也没有找到为什么 awk 会以这种奇怪的方式运行。

假设我在文件中有以下文本：

startcue
This shouldn't be found.

startcue
This is the text I want to find.
endcue

startcue
This shouldn't be found either.

我想找到行 "startcue"、"This is the text I want to find." 和 "endcue"。

我天真地假设通过 awk '/startcue/,/endcue/' 进行简单的范围搜索就可以做到，但这会打印出整个文件。我猜 awk 以某种方式找到了第一个范围，但是当第三个 startcue 触发行打印时，它会打印所有行直到文件末尾（不过，这对我来说似乎有点奇怪）。

现在问题来了：我怎样才能让 awk 只打印出我不需要的行？也许作为一个额外的问题：任何人都可以解释 awk 的行为吗？

谢谢

Answer 1

$ awk '/startcue/{f=1; buf=""} f{buf = buf [=10=] RS} /endcue/{printf "%s",buf; f=0}' file
startcue
This is the text I want to find.
endcue

Answer 2

总结一下这个问题，您需要打印从 startcue 到 endcue 的行但如果缺少 endcue 则不需要。 Ed Morton 的方法很好。这是另一种方法：

$ tac file | awk '/endcue/,/startcue/' | tac
startcue
This is the text I want to find.
endcue

tac file

这会以相反的顺序打印行。 tac 与 cat 类似，只是行的顺序相反。
awk '/endcue/,/startcue/'

这会打印从 endcue 开始到 startcue 结束的所有行。这样做时，不会打印缺少结尾提示的段落。
tac

这会再次反转行，以便以正确的顺序返回。

考虑：

 awk '/startcue/,/endcue/' file

这告诉 awk 在找到 startcue 时开始打印并继续打印直到找到 endcue。这正是它在您的文件上所做的。

没有暗示范围 /startcue/,/endcue/ 本身不能包含 startcue 的多个实例的隐含规则。 awk 在看到第一次出现 startcue 时简单地开始打印，并继续打印直到找到 endcue.

Answer 3

这是一个简单的方法。
由于数据由空行分隔，我将 RS 设置为空。
这使得 awk 可以处理块中的数据。
然后找到所有以startcue开头并以endcue

结尾的块

awk -v RS="" '/^startcue/ && /endcue$/' file
startcue
This is the text I want to find.
endcue

如果 startcue 和 endcue 始终是开始行和结束行，并且只在块中出现一次，则应该这样做：（PS 测试确实表明它不会块中有更多或更少的命中很重要。如果找到 startclue 和 endcue，这总是打印块）

awk -v RS="" '/startcue/ && /endcue/' file
startcue
This is the text I want to find.
endcue

这也应该有效：

awk -v RS="" '/startcue.*endcue/' file
startcue
This is the text I want to find.
endcue

How to print a greedy range of lines using awk