根据开始和结束模式删除字符串并在此过程中删除换行符
Remove string based on start and end pattern and remove newline in the process
我有一个包含一些命令输出的文件 - 不幸的是,其中一些被控制台错误破坏了:
path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745error: unable to retrieve file meta data from cluster.local:1095 [ status=down ]
nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="nonerror: unable to retrieve file meta data from cluster.local:1095 [ status=(down) ]
e"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/dataerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"
基本上我希望完全删除以 error: unable
开头并以 ]
字符结尾的字符串,而不是 :
diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"
我会有:
diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"
我尝试了以下方法:
sed -e 's/error:.*]$//g'
然而这给了我:
diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="n
one"
如何让它在删除错误字符串时也删除换行符?
谢谢
使用 gnu sed
你可以这样做:
sed '/error: unable.*/ {s///;N;s/\n//;}' file
或使用awk
:
awk 'sub(/error: unable.*/, "") {s = [=11=]; getline; print s [=11=]}' file
使用sed
$ sed '/nerror:/{s/\(error_label=\)"nerror: unable[^]]*]/"none"/g;n;d}' input_file
使用 GNU sed -E
(启用 ERE)和 -z
(一次读取整个文件,从而允许我们匹配正则表达式中的换行符):
$ sed -Ez 's/error: unable[^]]+](\r?\n)?//g' file
path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="none"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/data/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"
以上内容适用于您在要匹配的文本末尾有换行符,无论它们是 \n
s 还是 \r\n
s。
使用您展示的示例,请尝试在此处使用 awk 的 RS 作为 null 来遵循 awk
。在此处使用 GNU awk
编写和测试。
awk -v RS="" '{gsub(/error: unable[^]]+]\n*/,"")} 1' Input_file
解释: 简单的解释就是,使用全局替换来替换 error: unable
直到 ]
直到换行符(0 次或多次出现)为 NULL 然后执行打印。
我有一个包含一些命令输出的文件 - 不幸的是,其中一些被控制台错误破坏了:
path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745error: unable to retrieve file meta data from cluster.local:1095 [ status=down ]
nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="nonerror: unable to retrieve file meta data from cluster.local:1095 [ status=(down) ]
e"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/dataerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"
基本上我希望完全删除以 error: unable
开头并以 ]
字符结尾的字符串,而不是 :
diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"
我会有:
diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"
我尝试了以下方法:
sed -e 's/error:.*]$//g'
然而这给了我:
diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="n
one"
如何让它在删除错误字符串时也删除换行符?
谢谢
使用 gnu sed
你可以这样做:
sed '/error: unable.*/ {s///;N;s/\n//;}' file
或使用awk
:
awk 'sub(/error: unable.*/, "") {s = [=11=]; getline; print s [=11=]}' file
使用sed
$ sed '/nerror:/{s/\(error_label=\)"nerror: unable[^]]*]/"none"/g;n;d}' input_file
使用 GNU sed -E
(启用 ERE)和 -z
(一次读取整个文件,从而允许我们匹配正则表达式中的换行符):
$ sed -Ez 's/error: unable[^]]+](\r?\n)?//g' file
path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="none"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/data/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"
以上内容适用于您在要匹配的文本末尾有换行符,无论它们是 \n
s 还是 \r\n
s。
使用您展示的示例,请尝试在此处使用 awk 的 RS 作为 null 来遵循 awk
。在此处使用 GNU awk
编写和测试。
awk -v RS="" '{gsub(/error: unable[^]]+]\n*/,"")} 1' Input_file
解释: 简单的解释就是,使用全局替换来替换 error: unable
直到 ]
直到换行符(0 次或多次出现)为 NULL 然后执行打印。