如何使用多个分隔符格式化文本

Question

我想 extract/format 一些带有 awk 的文本。

源文本如下所示：

Section 1:
  main_command some_command1 some_subcommand1      # comment 1
  main_command some_command1 some_subcommand2      # comment 2

Section 2:
  main_command some_command2 some_subcommand3      # comment 3
  main_command some_command2 some_subcommand4      # comment 4

Section 3:
  main_command some_command3 some_subcommand5      # comment 5
  main_command some_command3 some_subcommand6      # comment 6

我想知道如何：

过滤到 Section 2 下的缩进行；
指定我想要的列（2 或 3）；和
提取评论（在 # 之后）。

例如，如果我选择第 2 列，则输出为：

some_command2<tab>'comment 3'
some_command2<tab>'comment 4'

我用 awk 实现了 1 和 2:

  awk -v RS='\n\n' '/^Section 2:/' "$path" | awk "/^  main_command/ {print }"

...但我怀疑有更好的方法可以在没有管道的情况下完成这一切。我愿意使用其他工具（例如 sed）。

Answer 1

您可以使用适用于任何版本 awk 的 awk 解决方案：

awk -v sq="'" -v OFS='\t' -v n=1 '
 == "Section" {
   p = ( == "2:")
   next
}
NF && p {
   s = [=10=]
   sub(/^[^#]*#[[:blank:]]*/, "", s)
   print , sq s sq
}' file

blah7   'some comment 3...'
blah10  'some more comments 4...'

使用 n=2 打印第 2 列：

awk -v sq="'" -v OFS='\t' -v n=2 ' == "Section" {p = ( == "2:"); next} NF && p {split([=11=], a, /#[[:blank:]]*/); print , sq a[2] sq}' 

fileblah7   'some comment 3...'
blah10  'some more comments 4...'

Answer 2

$ cat tst.awk
BEGIN { OFS="\t" }
/^[^[:space:]]/ {
    this_sect = [=10=]
    next
}
NF && (this_sect == sect) {
    val = $col
    sub(/[^#]*#[[:space:]]*/,"")
    print val, "7" [=10=] "7"
}

$ awk -v sect='Section 2:' -v col=2 -f tst.awk file
some_command2   'comment 3'
some_command2   'comment 4'

Answer 3

一个awk想法：

awk -v sec=1 -v col=3 '                                 # define section and column to process
/^Section/      { process= ( == sec":") ? 1 : 0
                  next
                }
process && NF>0 { split([=10=],arr,"#")
                  gsub(/^[[:space:]]/,"",arr[2])
                  print $(col) "\t7" arr[2] "7"
                }
' "${path}"

对于 sec=1 和 col=3 这会生成：

some_subcommand1        'comment 1'
some_subcommand2        'comment 2'

对于 sec=2 和 col=2 这会生成：

some_command2   'comment 3'
some_command2   'comment 4'

如何使用多个分隔符格式化文本

How to format text with multiple separators

awk

sed