sed 或 awk 脚本来替换文本文件的结构
sed or awk script to substitute the structure of a text file
我想创建一个 sed 或 awk 脚本,它 awk -f script.awk oldfile > newfile
将给定的文本文件 oldfile
转换为内容
Some Heading
example text
Another Heading
1. example list item, but it
spans over multiple lines
2. list item
进入一个新的文本文件newfile
,内容为:
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but it spans over multiple lines
[item] list item
}
进一步描述以消除可能的歧义:
- 脚本应相应地替换每个块(即由空行封装的行)。
- 在一个文本文件中,可能会出现多个这样的块,并且不清楚它们出现的顺序。
- 脚本应该根据标题(即块的第一行)是否后跟项目列表(以“1.”开头的行表示)来有条件地进行替换。
- 块总是由空行分隔。
如何使用 sed 或 awk 完成此操作? (我使用 zsh 以防万一。)
补充:我刚刚发现我真的需要事先知道这个块是否是一个列表:
heading
1. foo
2. bar
到
{list: heading}{
[item] foo
[item] bar
}
所以我需要输入“列表:”如果它是一个列表。这也可以吗?
使用 awk 你可以做这样的事情:
awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" [=10=] ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n [item] " [=10=]; next } { block = block (list ? " " : "") [=10=] } END { print block (list ? "\n}" : "}") }' filename
代码所在:
#!/usr/bin/awk -f
/^$/ { # empty line: print converted block
print block (list ? "\n}" : "}") # Whether there's a newline before the
block = "" # closing } depends on whether this is
next # a list. Reset block buffer.
}
block == "" { # in the first line of a block:
block = "{" [=11=] ":} {" # format header
list = 0 # reset list flag
next
}
/^[0-9]+\. / { # if a data line opens a list
list = 1 # set list flag
sub(/^[0-9]+\. /, "") # remove number
block = block "\n [item] " [=11=] # format line
next
}
{ # if it doesn't, just append it. Space
block = block (list ? " " : "") [=11=] # inside a list to not fuse words.
}
END { # and at the very end, print the last
print block (list ? "\n}" : "}") # block
}
用sed也可以,但比较难读:
#!/bin/sed -nf
/^$/ { # empty line: print converted block
x # fetch it from the hold buffer
s/$/}/ # append closing }
/\n \[item\]/ s/}$/\n}/ # in a list, put in a newline before it
p # print
d # and we're done here. Hold buffer is now empty.
}
x # otherwise: inspect the hold buffer
// { # if it is empty (reusing last regex)
x # get back the pattern space
s/.*/{&:}{/ # Format header
h # hold it.
d # we're done here.
}
x # otherwise, get back the pattern space
/^[0-9]\+\. / { # if the line opens a list
s/// # remove the number (reusing regex)
s/.*/ [item] &/ # format the line
H # append it to the hold buffer.
${ # if it is the last line
s/.*/}/ # append a closing bracket
H # to the hold buffer
x # swap it with the hold buffer
p # and print that.
}
d # we're done.
}
# otherwise (not opening a list item)
H # append line to the hold buffer
x # fetch back the hold buffer to work on it
/\n \[item\]/ { # if we're in a list
s/\(.*\)\n/ / # replace the last newline (that we just put there)
# with a space
${
s/$/\n}/ # if this is the last line, append \n}
p # and print
}
x # put the half-assembled block in the hold buffer
d # and we're done
}
s/\(.*\)\n// # otherwise (not in a list): just remove the newline
${
s/$/}/ # if this is the last line, append closing bracket
p # print
}
x # put half-assembled block in the hold buffer.
sed 是面向行的,因此最适合在一行上进行简单替换。
只需在段落模式下使用 awk (RS=""
),这样每一块以空行分隔的文本都被视为一条记录,并将每个段落中的每一行都视为记录的一个字段 (FS="\n"
):
$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {",
inList = 0
for (i=2; i<=NF; i++) {
if ( sub(/^[0-9]+\./," [item]",$i) ) {
printf "\n"
inList = 1
}
else if (inList) {
printf " "
}
printf "%s", $i
}
print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}
{list: Another Heading} {
[item] example list item, but it spans over multiple lines
[item] list item
}
另一个awk版本(类似于Eds)
BEGIN{RS="";FS="\n"}
{
{printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")":} {"
for(i=2;i<=NF;i++)
printf "%s",sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
print x?"\n}":"}""\n"
x=0
}
输出
$awk -f test.awk file
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but itspans over multiple lines
[item] list item
}
工作原理
BEGIN{RS="";FS="\n"}
以空行分隔的块形式读取记录。
将字段读取为行。
{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")":} {"
以指定的格式打印第一个字段(行),注意 printf 用于省略换行符。
检查记录的任何部分是否包含换行符,然后是数字和句点,如果包含则添加列表。
for(i=2;i<=NF;i++)
从第二个字段循环到最后一个字段。 NF
是字段数。
下一段我会分开。
printf "%s"
打印字符串,再次使用printf控制换行
sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
这实际上是一个使用三元运算符 a?b:c
的 if else 语句。
如果无法完成,sub 将 return 0 并且 x 不会递增,因此该行将按原样打印。
如果 sub 成功,它将用 [item]
替换该行开头的数字,增加 x 并打印新行,前面有一个换行符。
print x?"\n}":"}""\n"
再次使用三元运算符检查 x 是否为 incremented.If 它在 }
之前打印一个换行符,否则只是打印 }
。为双换行符打印一个换行符记录。
我想创建一个 sed 或 awk 脚本,它 awk -f script.awk oldfile > newfile
将给定的文本文件 oldfile
转换为内容
Some Heading
example text
Another Heading
1. example list item, but it
spans over multiple lines
2. list item
进入一个新的文本文件newfile
,内容为:
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but it spans over multiple lines
[item] list item
}
进一步描述以消除可能的歧义:
- 脚本应相应地替换每个块(即由空行封装的行)。
- 在一个文本文件中,可能会出现多个这样的块,并且不清楚它们出现的顺序。
- 脚本应该根据标题(即块的第一行)是否后跟项目列表(以“1.”开头的行表示)来有条件地进行替换。
- 块总是由空行分隔。
如何使用 sed 或 awk 完成此操作? (我使用 zsh 以防万一。)
补充:我刚刚发现我真的需要事先知道这个块是否是一个列表:
heading
1. foo
2. bar
到
{list: heading}{
[item] foo
[item] bar
}
所以我需要输入“列表:”如果它是一个列表。这也可以吗?
使用 awk 你可以做这样的事情:
awk '/^$/ { print block (list ? "\n}" : "}"); block = ""; next } block == "" { block = "{" [=10=] ":} {"; list = 0; next } /^[0-9]+\. / { list = 1; sub(/^[0-9]+\. /, ""); block = block "\n [item] " [=10=]; next } { block = block (list ? " " : "") [=10=] } END { print block (list ? "\n}" : "}") }' filename
代码所在:
#!/usr/bin/awk -f
/^$/ { # empty line: print converted block
print block (list ? "\n}" : "}") # Whether there's a newline before the
block = "" # closing } depends on whether this is
next # a list. Reset block buffer.
}
block == "" { # in the first line of a block:
block = "{" [=11=] ":} {" # format header
list = 0 # reset list flag
next
}
/^[0-9]+\. / { # if a data line opens a list
list = 1 # set list flag
sub(/^[0-9]+\. /, "") # remove number
block = block "\n [item] " [=11=] # format line
next
}
{ # if it doesn't, just append it. Space
block = block (list ? " " : "") [=11=] # inside a list to not fuse words.
}
END { # and at the very end, print the last
print block (list ? "\n}" : "}") # block
}
用sed也可以,但比较难读:
#!/bin/sed -nf
/^$/ { # empty line: print converted block
x # fetch it from the hold buffer
s/$/}/ # append closing }
/\n \[item\]/ s/}$/\n}/ # in a list, put in a newline before it
p # print
d # and we're done here. Hold buffer is now empty.
}
x # otherwise: inspect the hold buffer
// { # if it is empty (reusing last regex)
x # get back the pattern space
s/.*/{&:}{/ # Format header
h # hold it.
d # we're done here.
}
x # otherwise, get back the pattern space
/^[0-9]\+\. / { # if the line opens a list
s/// # remove the number (reusing regex)
s/.*/ [item] &/ # format the line
H # append it to the hold buffer.
${ # if it is the last line
s/.*/}/ # append a closing bracket
H # to the hold buffer
x # swap it with the hold buffer
p # and print that.
}
d # we're done.
}
# otherwise (not opening a list item)
H # append line to the hold buffer
x # fetch back the hold buffer to work on it
/\n \[item\]/ { # if we're in a list
s/\(.*\)\n/ / # replace the last newline (that we just put there)
# with a space
${
s/$/\n}/ # if this is the last line, append \n}
p # and print
}
x # put the half-assembled block in the hold buffer
d # and we're done
}
s/\(.*\)\n// # otherwise (not in a list): just remove the newline
${
s/$/}/ # if this is the last line, append closing bracket
p # print
}
x # put half-assembled block in the hold buffer.
sed 是面向行的,因此最适合在一行上进行简单替换。
只需在段落模式下使用 awk (RS=""
),这样每一块以空行分隔的文本都被视为一条记录,并将每个段落中的每一行都视为记录的一个字段 (FS="\n"
):
$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
printf "{" (/\n[0-9]+\./ ? "list: %s" : "%s:") "} {",
inList = 0
for (i=2; i<=NF; i++) {
if ( sub(/^[0-9]+\./," [item]",$i) ) {
printf "\n"
inList = 1
}
else if (inList) {
printf " "
}
printf "%s", $i
}
print (inList ? "\n" : "") "}"
}
$
$ awk -f tst.awk file
{Some Heading:} {example text}
{list: Another Heading} {
[item] example list item, but it spans over multiple lines
[item] list item
}
另一个awk版本(类似于Eds)
BEGIN{RS="";FS="\n"}
{
{printf "%s", "{"(/\n[0-9]+\./?"Line: ":"")":} {"
for(i=2;i<=NF;i++)
printf "%s",sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
print x?"\n}":"}""\n"
x=0
}
输出
$awk -f test.awk file
{Some Heading:} {example text}
{Another Heading:} {
[item] example list item, but itspans over multiple lines
[item] list item
}
工作原理
BEGIN{RS="";FS="\n"}
以空行分隔的块形式读取记录。
将字段读取为行。
{printf "%s", "{"(/\n[0-9]+\./?"List: ":"")":} {"
以指定的格式打印第一个字段(行),注意 printf 用于省略换行符。 检查记录的任何部分是否包含换行符,然后是数字和句点,如果包含则添加列表。
for(i=2;i<=NF;i++)
从第二个字段循环到最后一个字段。 NF
是字段数。
下一段我会分开。
printf "%s"
打印字符串,再次使用printf控制换行
sub(/^[0-9]+\./," [item]",$i)&&++x?"\n"$i:$i
这实际上是一个使用三元运算符 a?b:c
的 if else 语句。
如果无法完成,sub 将 return 0 并且 x 不会递增,因此该行将按原样打印。
如果 sub 成功,它将用 [item]
替换该行开头的数字,增加 x 并打印新行,前面有一个换行符。
print x?"\n}":"}""\n"
再次使用三元运算符检查 x 是否为 incremented.If 它在 }
之前打印一个换行符,否则只是打印 }
。为双换行符打印一个换行符记录。