使用 awk trim 去除 2 种模式之外的文本文件部分
Using awk to trim away parts of a text file outside 2 patterns
我想要一个优雅的 awk
解决方案来编辑文件中的行。到目前为止,我只设法使用 2 个 sed
命令和 1 个 awk
命令完成了任务。
每个文件都由一个 header 不确定长度组成,后跟我要捕获的数据,然后是始终以相同字符串 (WATER) 开头的页脚。数据由几个 3 行块组成,我想将它们连接成单行,每个 3 行块以相同的字符串 (GROUPS) 开头。
每当我找到 GROUPS 时,将以下行连接起来,直到下一次出现 GROUPS 并重复,直到找到 WATER,删除 WATER 行,并将所有后续行删除到文件末尾。
输入:
header stuff
more header stuff
even more header stuff
GROUPS data data data data
mo data mo data mo data
even more even more
GROUPS data data data data
mo data mo data mo data
even more even more
GROUPS data data data data
mo data mo data mo data
even more even more
.......
last line of data
WATER footer stuff footer stuff
footer stuff
more footer stuff
even more footer stuff
输出:
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
........
GROUPS data data data data mo data mo data even more last line of data
如有任何帮助,我们将不胜感激!
编辑:
这是我的(可能是片状的)解决方案!
1:Trim header
sed -n '/"GROUPS"/,$p' originalfile > outputfile1
2:Trim 页脚
sed '/"WATER"/,$d' outputfile1 > outputfile2
3:连接行
awk 'NF&&=RS' RS="GROUPS" outputfile2 > finaloutputfile
这是一个gnu awk
(gnu 由于记录分隔符中有多个字符)
awk -v RS="GROUPS|WATER" -F"\n" 'p=="WATER"{exit} {=p }NR>1; {p=RT}' file
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more ....... last line of data
通过将 RS
设置为 GROUPS
和 WATER
并重新创建行 =p
它使所有内容都在一行中。
如果行以 WATER
开头,则退出。这样就不会再从 WATER
开始打印行了。
p
设置为之前的 RT
(使用的分隔符)
让我们用艰苦的方式:
awk '/^GROUPS/ {if (string) print string; f=1; string=[=10=]; next}
/^WATER/ {print string; f=0}
f {string=string" "[=10=]}' file
这会在找到 GROUPS
时启动 "recording" 变量 string
中的行,并在找到 WATER
时停止这样做。当看到 GROUPS
时,还打印存储的字符串(如果存在)并为下一次迭代清理它。
测试
$ awk '/^GROUPS/ {if (string) print string; f=1; string=[=11=]; next} /^WATER/ {print string; f=0} f {string=string=stri [=11=]}' a
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more ....... last line of data
我想要一个优雅的 awk
解决方案来编辑文件中的行。到目前为止,我只设法使用 2 个 sed
命令和 1 个 awk
命令完成了任务。
每个文件都由一个 header 不确定长度组成,后跟我要捕获的数据,然后是始终以相同字符串 (WATER) 开头的页脚。数据由几个 3 行块组成,我想将它们连接成单行,每个 3 行块以相同的字符串 (GROUPS) 开头。
每当我找到 GROUPS 时,将以下行连接起来,直到下一次出现 GROUPS 并重复,直到找到 WATER,删除 WATER 行,并将所有后续行删除到文件末尾。
输入:
header stuff
more header stuff
even more header stuff
GROUPS data data data data
mo data mo data mo data
even more even more
GROUPS data data data data
mo data mo data mo data
even more even more
GROUPS data data data data
mo data mo data mo data
even more even more
.......
last line of data
WATER footer stuff footer stuff
footer stuff
more footer stuff
even more footer stuff
输出:
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
........
GROUPS data data data data mo data mo data even more last line of data
如有任何帮助,我们将不胜感激!
编辑:
这是我的(可能是片状的)解决方案!
1:Trim header
sed -n '/"GROUPS"/,$p' originalfile > outputfile1
2:Trim 页脚
sed '/"WATER"/,$d' outputfile1 > outputfile2
3:连接行
awk 'NF&&=RS' RS="GROUPS" outputfile2 > finaloutputfile
这是一个gnu awk
(gnu 由于记录分隔符中有多个字符)
awk -v RS="GROUPS|WATER" -F"\n" 'p=="WATER"{exit} {=p }NR>1; {p=RT}' file
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more ....... last line of data
通过将 RS
设置为 GROUPS
和 WATER
并重新创建行 =p
它使所有内容都在一行中。
如果行以 WATER
开头,则退出。这样就不会再从 WATER
开始打印行了。
p
设置为之前的 RT
(使用的分隔符)
让我们用艰苦的方式:
awk '/^GROUPS/ {if (string) print string; f=1; string=[=10=]; next}
/^WATER/ {print string; f=0}
f {string=string" "[=10=]}' file
这会在找到 GROUPS
时启动 "recording" 变量 string
中的行,并在找到 WATER
时停止这样做。当看到 GROUPS
时,还打印存储的字符串(如果存在)并为下一次迭代清理它。
测试
$ awk '/^GROUPS/ {if (string) print string; f=1; string=[=11=]; next} /^WATER/ {print string; f=0} f {string=string=stri [=11=]}' a
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more ....... last line of data