根据开始和结束模式删除字符串并在此过程中删除换行符

Remove string based on start and end pattern and remove newline in the process

我有一个包含一些命令输出的文件 - 不幸的是,其中一些被控制台错误破坏了:

path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745error: unable to retrieve file meta data from cluster.local:1095 [ status=down ]
nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="nonerror: unable to retrieve file meta data from cluster.local:1095 [ status=(down) ]
e"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/dataerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"

基本上我希望完全删除以 error: unable 开头并以 ] 字符结尾的字符串,而不是 :

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="nerror: unable to retrieve file meta data from cluster.local:1095 [ status=(null) ]
one"

我会有:

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"

我尝试了以下方法:

sed -e 's/error:.*]$//g'

然而这给了我:

diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="n
one"

如何让它在删除错误字符串时也删除换行符?

谢谢

使用 gnu sed 你可以这样做:

sed '/error: unable.*/ {s///;N;s/\n//;}' file

或使用awk:

awk 'sub(/error: unable.*/, "") {s = [=11=]; getline; print s [=11=]}' file

使用sed

$ sed '/nerror:/{s/\(error_label=\)"nerror: unable[^]]*]/"none"/g;n;d}' input_file

使用 GNU sed -E(启用 ERE)和 -z(一次读取整个文件,从而允许我们匹配正则表达式中的换行符):

$ sed -Ez 's/error: unable[^]]+](\r?\n)?//g' file
path="/a/b/c" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="132" host="cluster.local:1095" fstpath="/data/00019507/3dcd7e00" size="4574568" statsize="45745nrep="00" fsid="37" host="cluster.local:1095" fstpath="/data/000021ca/0527e888" size="12550144" statsize="12550144" checksum="bb2a2ea700000000000000000000000000000000" diskchecksum="bb2a2ea700000000000000000000000000000000" error_label="none"
path="/a/b/b98d6d3a-5c77-4223-9601-9294c73e00f9.bin" fxid="05200f4d" size="12550144" nrep="2" checksumtype="adler" checksum="045a6aa400000000000000000000000000000000"
nrep="01" fsid="36" host="cluster.local:1095" fstpath="/data/00002196/05200f4d" size="12550144" statsize="12550144" checksum="045a6aa400000000000000000000000000000000" diskchecksum="045a6aa400000000000000000000000000000000" error_label="none"
path="/a/b/c/.mb6589013703229118680.txt" fxid="0524071a" size="0" nrep="2" checksumtype="adler" checksum="0000000100000000000000000000000000000000"
nrep="00" fsid="196" host="cluster.local:1095" fstpath="/data/000021b0/0524071a" size="0" statsize="0" checksum="0000000100000000000000000000000000000000" diskchecksum="0000000000000000000000000000000000000000" error_label="none"

以上内容适用于您在要匹配的文本末尾有换行符,无论它们是 \ns 还是 \r\ns。

使用您展示的示例,请尝试在此处使用 awk 的 RS 作为 null 来遵循 awk。在此处使用 GNU awk 编写和测试。

awk -v RS="" '{gsub(/error: unable[^]]+]\n*/,"")} 1' Input_file

解释: 简单的解释就是,使用全局替换来替换 error: unable 直到 ] 直到换行符(0 次或多次出现)为 NULL 然后执行打印。