文本文件中的递归替换(正则表达式)
Recursive replacement in a text file (regular expressions)
我想将变量递归替换到文本文件中,可能使用正则表达式,因此是一个非常简单的 bash 脚本。
这是我的文本文件:
###MS: 12/
###MSMS: 13/
BEGIN IONS
TITLE= Cmpd 1, +MSn(507.7145), 0.1 min
PEPMASS=507.71453 5708
###MS: 12/
###MSMS: 14/
BEGIN IONS
TITLE= Cmpd 2, +MSn(637.6461), 0.1 min
PEPMASS=637.64610 8328
文本文件由相同结构块的多次重复组成(如图所示):我想要实现的是将 TITLE= Cmpd
之后的数字替换为###MSMS
之后的数字。这应该为每个块完成,滚动文本文件,为每个块分配这个 Cmpd
值和 MSMS
值。
我尝试使用 sed
,使用之前在 Whosebug 上解释过的脚本,但使用 Cmpd [0-9]
仅适用于第二部分(替换)而不适用于数字的选择在 MSMS
.
之后
while read line
do
varA="###MSMS: "[0-9][0-9]
varB="Cmpd "[0-9]
line='echo$line|sed -e "s/$varA/$varB/"'
echo&line >> "outputfile.txt"
done < "inputfile.txt"
在此先感谢您,这将是了解更多相关信息的绝佳机会。
您可以使用保持缓冲区来完成此操作。在 sed 中,h
命令将当前模式 space 复制到保持缓冲区中,然后 G
命令检索它并将其附加到模式 space 和换行符。
下面的 sed 脚本提取 MSMS 号码并将其存储在模式缓冲区中,然后当它看到 TITLE 行时,它将该行与保存的号码连接起来,然后用它找到的第一个数字序列替换最后保存的值(并丢弃附加数据。)
#!/usr/bin/sed -f
/^###MSMS:/{
p
s/[^0-9]//g
h
d
}
/^TITLE=/{
G
s/\([^0-9]*\)[0-9]*\(.*\)\n\(.*\)//
}
运行 根据提供的样本数据生成的脚本:
###MS: 12/
###MSMS: 13/
BEGIN IONS
TITLE= Cmpd 13, +MSn(507.7145), 0.1 min
PEPMASS=507.71453 5708
###MS: 12/
###MSMS: 14/
BEGIN IONS
TITLE= Cmpd 14, +MSn(637.6461), 0.1 min
PEPMASS=637.64610 8328
有兴趣的朋友,
我放弃了正则表达式方法,而是编写了一个 Python 脚本,解决了这个问题。 (MGF 为文件名)
# Read the MGF lines
mgfLines = mgf.readlines ()
# Create the final list of lines
outputLines = []
# Scroll the lines
for lines in mgfLines:
# If there is no ##MSMS value or TITLE
if ("###MSMS" not in lines and "TITLE" not in lines):
# Write the line to the output file
outputLines.append (lines)
# If there is the ##MSMS value
if "###MSMS" in lines:
# Write the line to the output file
outputLines.append (lines)
#### Store the value of the MSMS
# Index the space and the /, and in between there is the MSMS number
for i in range(len(lines)):
if (lines[i] == " "):
spaceIndex = i
for i in range(len(lines)):
if (lines[i] == "/"):
slashIndex = i
# The MSMS number is in between
msmsValueFinal = int (lines[spaceIndex+1 : slashIndex])
# If there is the TITLE value (the MSMS value has to be passed to TITLE)
if "TITLE" in lines:
# Split the line at the commas
titleLineSplitted = str.split (lines, ",")
# Three pieces are generated: the last two will be joined back together with a comma, the first will have the TITLE Cmpd number replaced by the MSMS value
titleLineSplitted[0] = "TITLE= Cmpd " + str(msmsValueFinal)
# Join back the pieces
finalTitleLine = titleLineSplitted[0] + "," + titleLineSplitted[1] + "," + titleLineSplitted[2]
# Write the final line to the file
outputLines.append (finalTitleLine)
我想将变量递归替换到文本文件中,可能使用正则表达式,因此是一个非常简单的 bash 脚本。
这是我的文本文件:
###MS: 12/
###MSMS: 13/
BEGIN IONS
TITLE= Cmpd 1, +MSn(507.7145), 0.1 min
PEPMASS=507.71453 5708
###MS: 12/
###MSMS: 14/
BEGIN IONS
TITLE= Cmpd 2, +MSn(637.6461), 0.1 min
PEPMASS=637.64610 8328
文本文件由相同结构块的多次重复组成(如图所示):我想要实现的是将 TITLE= Cmpd
之后的数字替换为###MSMS
之后的数字。这应该为每个块完成,滚动文本文件,为每个块分配这个 Cmpd
值和 MSMS
值。
我尝试使用 sed
,使用之前在 Whosebug 上解释过的脚本,但使用 Cmpd [0-9]
仅适用于第二部分(替换)而不适用于数字的选择在 MSMS
.
while read line
do
varA="###MSMS: "[0-9][0-9]
varB="Cmpd "[0-9]
line='echo$line|sed -e "s/$varA/$varB/"'
echo&line >> "outputfile.txt"
done < "inputfile.txt"
在此先感谢您,这将是了解更多相关信息的绝佳机会。
您可以使用保持缓冲区来完成此操作。在 sed 中,h
命令将当前模式 space 复制到保持缓冲区中,然后 G
命令检索它并将其附加到模式 space 和换行符。
下面的 sed 脚本提取 MSMS 号码并将其存储在模式缓冲区中,然后当它看到 TITLE 行时,它将该行与保存的号码连接起来,然后用它找到的第一个数字序列替换最后保存的值(并丢弃附加数据。)
#!/usr/bin/sed -f
/^###MSMS:/{
p
s/[^0-9]//g
h
d
}
/^TITLE=/{
G
s/\([^0-9]*\)[0-9]*\(.*\)\n\(.*\)//
}
运行 根据提供的样本数据生成的脚本:
###MS: 12/
###MSMS: 13/
BEGIN IONS
TITLE= Cmpd 13, +MSn(507.7145), 0.1 min
PEPMASS=507.71453 5708
###MS: 12/
###MSMS: 14/
BEGIN IONS
TITLE= Cmpd 14, +MSn(637.6461), 0.1 min
PEPMASS=637.64610 8328
有兴趣的朋友,
我放弃了正则表达式方法,而是编写了一个 Python 脚本,解决了这个问题。 (MGF 为文件名)
# Read the MGF lines
mgfLines = mgf.readlines ()
# Create the final list of lines
outputLines = []
# Scroll the lines
for lines in mgfLines:
# If there is no ##MSMS value or TITLE
if ("###MSMS" not in lines and "TITLE" not in lines):
# Write the line to the output file
outputLines.append (lines)
# If there is the ##MSMS value
if "###MSMS" in lines:
# Write the line to the output file
outputLines.append (lines)
#### Store the value of the MSMS
# Index the space and the /, and in between there is the MSMS number
for i in range(len(lines)):
if (lines[i] == " "):
spaceIndex = i
for i in range(len(lines)):
if (lines[i] == "/"):
slashIndex = i
# The MSMS number is in between
msmsValueFinal = int (lines[spaceIndex+1 : slashIndex])
# If there is the TITLE value (the MSMS value has to be passed to TITLE)
if "TITLE" in lines:
# Split the line at the commas
titleLineSplitted = str.split (lines, ",")
# Three pieces are generated: the last two will be joined back together with a comma, the first will have the TITLE Cmpd number replaced by the MSMS value
titleLineSplitted[0] = "TITLE= Cmpd " + str(msmsValueFinal)
# Join back the pieces
finalTitleLine = titleLineSplitted[0] + "," + titleLineSplitted[1] + "," + titleLineSplitted[2]
# Write the final line to the file
outputLines.append (finalTitleLine)