如何：使用模式匹配中的文本进行多个多行替换？

Question

我正在 Bash 中实现注释功能，正在寻找用于某些文本操作的 awk 或 sed 解决方案。

我想转换文件中的文本：

^version 10.2 tag1 tag2
^audit arg1 arg2
f()
{
...
}
g()
{
...
}
^version 10.2
h() { ... }
^version 10.2

i() { ... } # Not annotated: doesn't immediately follow an annotation

至：

annotate f^1 version 10.2 tag1 tag2
annotate f^1 audit arg1 arg2
f^1()
{
...
}
g()
{
...
}
annotate h^2 10.2
h^2() { ... }

i() { ... } # Not annotated: doesn't immediately follow an annotation

替换如下：

以 ^ 开头的行被 annotate、space、注释行之后的函数名称、^、索引和该行的其余部分
函数名后缀为^和索引（在此之后，索引递增）

函数名称从第 1 列开始，并且 Bash 函数名称 不需要 POSIX 合规性 （参见 Bash 源代码 builtins/declare.def：shell 函数名称不必是有效标识符；而且，在 parse.y 中，函数是 WORD) .对于模式的功能部分，一个可以接受的不完美正则表达式是（但我会投票支持可以找出更好的正则表达式的解决方案，即使他们没有回答更大的问题——很难从阅读源代码中找出答案):

^[^'"()]\+\s*(\s*)

请注意，注释仅适用于匹配后紧随其后的函数。如果函数没有紧跟注释行，则根本不应发出注释。

解决方案应该是通用的，不包括在上面的示例中找到的字符串（version、audit、f、g、h 等）。

解决方案不得要求 utilities/packages 在 CentOS 7 Minimal 中找不到。所以，不幸的是，不能考虑 Perl。我更喜欢 awk 解决方案。

您的回答将用于改进开源 Bash 项目的代码：Eggsh.

Answer 1

尝试这样的事情：

/^\^/ { if (ann == 0) count++; ann++; acc[ann] = substr([=10=], 2); next; }
/^[a-zA-Z0-9_]\s*(\s*)/ && ann {
    ind = index([=10=], "(");
    fname = substr([=10=], 1, ind-1)
    for (i = 1; i <= ann; i++) {
        print "annotate " fname "^" count " " acc[i];
    }
    print fname "^" count substr([=10=], ind);
    ann = 0;
    next;
}
{ ann = 0; print; }

请注意，我没有费心去做必要的研究来找到更好的函数名称正则表达式。

如何：使用模式匹配中的文本进行多个多行替换？

How do to: multiple multi-line replacements using text from the pattern match?

regex

bash

awk

sed

centos7