文本文件中的递归替换(正则表达式)

Recursive replacement in a text file (regular expressions)

我想将变量递归替换到文本文件中,可能使用正则表达式,因此是一个非常简单的 bash 脚本。

这是我的文本文件:

 ###MS: 12/
 ###MSMS: 13/
 BEGIN IONS
 TITLE= Cmpd 1, +MSn(507.7145), 0.1 min
 PEPMASS=507.71453  5708

 ###MS: 12/
 ###MSMS: 14/
 BEGIN IONS
 TITLE= Cmpd 2, +MSn(637.6461), 0.1 min
 PEPMASS=637.64610  8328

文本文件由相同结构块的多次重复组成(如图所示):我想要实现的是将 TITLE= Cmpd 之后的数字替换为###MSMS 之后的数字。这应该为每个块完成,滚动文本文件,为每个块分配这个 Cmpd 值和 MSMS 值。

我尝试使用 sed,使用之前在 Whosebug 上解释过的脚本,但使用 Cmpd [0-9] 仅适用于第二部分(替换)而不适用于数字的选择在 MSMS.

之后
while read line
 do
   varA="###MSMS: "[0-9][0-9]
   varB="Cmpd "[0-9]
   line='echo$line|sed -e "s/$varA/$varB/"'
   echo&line >> "outputfile.txt"
done < "inputfile.txt"

在此先感谢您,这将是了解更多相关信息的绝佳机会。

您可以使用保持缓冲区来完成此操作。在 sed 中,h 命令将当前模式 space 复制到保持缓冲区中,然后 G 命令检索它并将其附加到模式 space 和换行符。

下面的 sed 脚本提取 MSMS 号码并将其存储在模式缓冲区中,然后当它看到 TITLE 行时,它将该行与保存的号码连接起来,然后用它找到的第一个数字序列替换最后保存的值(并丢弃附加数据。)

#!/usr/bin/sed -f
/^###MSMS:/{
p
s/[^0-9]//g
h
d
}
/^TITLE=/{
G
s/\([^0-9]*\)[0-9]*\(.*\)\n\(.*\)//
}

运行 根据提供的样本数据生成的脚本:

###MS: 12/
###MSMS: 13/
BEGIN IONS
TITLE= Cmpd 13, +MSn(507.7145), 0.1 min
PEPMASS=507.71453  5708

###MS: 12/
###MSMS: 14/
BEGIN IONS
TITLE= Cmpd 14, +MSn(637.6461), 0.1 min
PEPMASS=637.64610  8328

有兴趣的朋友,

我放弃了正则表达式方法,而是编写了一个 Python 脚本,解决了这个问题。 (MGF 为文件名)

# Read the MGF lines
mgfLines = mgf.readlines ()
# Create the final list of lines
outputLines = []
# Scroll the lines
for lines in mgfLines:
    # If there is no ##MSMS value or TITLE
    if ("###MSMS" not in lines and "TITLE" not in lines):
        # Write the line to the output file
        outputLines.append (lines)
    # If there is the ##MSMS value
    if "###MSMS" in lines:
        # Write the line to the output file
        outputLines.append (lines)
        #### Store the value of the MSMS
        # Index the space and the /, and in between there is the MSMS number
        for i in range(len(lines)):
            if (lines[i] == " "):
                spaceIndex = i
        for i in range(len(lines)):
            if (lines[i] == "/"):
                slashIndex = i
        # The MSMS number is in between
        msmsValueFinal = int (lines[spaceIndex+1 : slashIndex])
    # If there is the TITLE value (the MSMS value has to be passed to TITLE)
    if "TITLE" in lines:
        # Split the line at the commas
        titleLineSplitted = str.split (lines, ",")
        # Three pieces are generated: the last two will be joined back together with a comma, the first will have the TITLE Cmpd number replaced by the MSMS value
        titleLineSplitted[0] = "TITLE= Cmpd " + str(msmsValueFinal)
        # Join back the pieces
        finalTitleLine = titleLineSplitted[0] + "," + titleLineSplitted[1] + "," + titleLineSplitted[2]
        # Write the final line to the file
        outputLines.append (finalTitleLine)