sed 或 awk 脚本来替换文本文件的结构

sed or awk script to substitute the structure of a text file

我想创建一个 sed 或 awk 脚本,它 awk -f script.awk oldfile > newfile 将给定的文本文件 oldfile 转换为内容

Some Heading
example text

Another Heading
1. example list item, but it
spans over multiple lines
2. list item

进入一个新的文本文件newfile,内容为:

{Some Heading:} {example text}

{Another Heading:} {
  [item] example list item, but it spans over multiple lines
  [item] list item
}

进一步描述以消除可能的歧义:

如何使用 sed 或 awk 完成此操作? (我使用 zsh 以防万一。)


补充:我刚刚发现我真的需要事先知道这个块是否是一个列表:

heading
1. foo
2. bar

{list: heading}{
 [item] foo
 [item] bar
}

所以我需要输入“列表:”如果它是一个列表。这也可以吗?

使用 awk 你可以做这样的事情:

awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" [=10=] ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n  [item] " [=10=]; next } { block = block (list ? " " : "") [=10=] } END { print block (list ? "\n}" : "}") }' filename

代码所在:

#!/usr/bin/awk -f

/^$/ {                               # empty line: print converted block
  print block (list ? "\n}" : "}")   # Whether there's a newline before the
  block = ""                         # closing } depends on whether this is
  next                               # a list. Reset block buffer.
}
block == "" {                        # in the first line of a block:
  block = "{" [=11=] ":} {"              # format header
  list = 0                           # reset list flag
  next
}
/^[0-9]+\. / {                       # if a data line opens a list
  list = 1                           # set list flag
  sub(/^[0-9]+\. /, "")              # remove number
  block = block "\n  [item] " [=11=]     # format line
  next
}
{                                    # if it doesn't, just append it. Space
  block = block (list ? " " : "") [=11=] # inside a list to not fuse words.
}
END {                                # and at the very end, print the last
  print block (list ? "\n}" : "}")   # block
}

用sed也可以,但比较难读:

#!/bin/sed -nf

/^$/ {                       # empty line: print converted block
  x                          # fetch it from the hold buffer
  s/$/}/                     # append closing }
  /\n  \[item\]/ s/}$/\n}/   # in a list, put in a newline before it
  p                          # print
  d                          # and we're done here. Hold buffer is now empty.
}
x                            # otherwise: inspect the hold buffer
// {                         # if it is empty (reusing last regex)
  x                          # get back the pattern space
  s/.*/{&:}{/                # Format header
  h                          # hold it.
  d                          # we're done here.
}
x                            # otherwise, get back the pattern space
/^[0-9]\+\. / {              # if the line opens a list
  s///                       # remove the number (reusing regex)
  s/.*/  [item] &/           # format the line
  H                          # append it to the hold buffer.
  ${                         # if it is the last line
    s/.*/}/                  # append a closing bracket
    H                        # to the hold buffer
    x                        # swap it with the hold buffer
    p                        # and print that.
  }
  d                          # we're done.
}
                             # otherwise (not opening a list item)
H                            # append line to the hold buffer
x                            # fetch back the hold buffer to work on it

/\n  \[item\]/ {             # if we're in a list
  s/\(.*\)\n/ /            # replace the last newline (that we just put there)
                             # with a space
  ${
    s/$/\n}/                 # if this is the last line, append \n}
    p                        # and print
  }
  x                          # put the half-assembled block in the hold buffer
  d                          # and we're done
}
s/\(.*\)\n//               # otherwise (not in a list): just remove the newline
${
  s/$/}/                     # if this is the last line, append closing bracket
  p                          # print
}
x                            # put half-assembled block in the hold buffer.

sed 是面向行的,因此最适合在一行上进行简单替换。

只需在段落模式下使用 awk (RS=""),这样每一块以空行分隔的文本都被视为一条记录,并将每个段落中的每一行都视为记录的一个字段 (FS="\n"):

$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
    printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {", 
    inList = 0
    for (i=2; i<=NF; i++) {
        if ( sub(/^[0-9]+\./,"  [item]",$i) ) {
            printf "\n"
            inList = 1
        }
        else if (inList) {
            printf " "
        }
        printf "%s", $i
    }
    print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}

{list: Another Heading} {
  [item] example list item, but it spans over multiple lines
  [item] list item
}

另一个awk版本(类似于Eds)

BEGIN{RS="";FS="\n"}
{
    {printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")":} {"
    for(i=2;i<=NF;i++)
    printf "%s",sub(/^[0-9]+\./,"  [item]",$i)&&++x?"\n"$i:$i
    print x?"\n}":"}""\n"
    x=0
}

输出

$awk -f test.awk file

{Some Heading:} {example text}

{Another Heading:} {
  [item] example list item, but itspans over multiple lines
  [item] list item
}

工作原理

BEGIN{RS="";FS="\n"}

以空行分隔的块形式读取记录。
将字段读取为行。

{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")":} {"

以指定的格式打印第一个字段(行),注意 printf 用于省略换行符。 检查记录的任何部分是否包含换行符,然后是数字和句点,如果包含则添加列表。

for(i=2;i<=NF;i++)

从第二个字段循环到最后一个字段。 NF是字段数。

下一段我会分开。

printf "%s"

打印字符串,再次使用printf控制换行

sub(/^[0-9]+\./,"  [item]",$i)&&++x?"\n"$i:$i

这实际上是一个使用三元运算符 a?b:c 的 if else 语句。 如果无法完成,sub 将 return 0 并且 x 不会递增,因此该行将按原样打印。
如果 sub 成功,它将用 [item] 替换该行开头的数字,增加 x 并打印新行,前面有一个换行符。

print x?"\n}":"}""\n"

再次使用三元运算符检查 x 是否为 incremented.If 它在 } 之前打印一个换行符,否则只是打印 }。为双换行符打印一个换行符记录。