正则表达式仅在适当的位置匹配带关键字的多行字符串
Regex to only match a multi-line string w/ keyword in its proper place
我有一个虚拟乐器文件,其中的块可能包含如下所示的任意属性组合:
POINT=69
Name="M_Frequency Min" Type=ANALOG
Units="Hz"
Archive="AVERAGE" Priority=9999 Latch=0
HysEnable=0 HysVal=0.00000
Bit="0"
Category="Meter"
IsCustom=1
Interval=0
Accumulated=0
DisplayOrder=1
ENDPOINT
POINT=70
Name="M_Voltage Phase A-N Max" Type=ANALOG
Units="Volts"
Archive="AVERAGE" Priority=9999 Latch=0
HysEnable=0 HysVal=0.000000
CritHiEnable=0 CritHiLimit=0.000000
CritLoEnable=0 CritLoLimit=0.000000
CautHiEnable=0 CautHiLimit=0.000000
CautLoEnable=0 CautLoLimit=0.000000
Desc="Voltage Phase A-N Max"
RW=READ
Register="9000"
RegType="H"
DataType="F"
Accumulated=0
DisplayOrder=1
ENDPOINT
说,我只想使用 POINT=[0-9]*(?s)(.*?)(?!ENDPOINT)(\sMax)(.*?)ENDPOINT
之类的东西匹配第二个块(而不是第一个)
我的想法是,如果我将我的 dot-star 设置为也匹配换行符,然后告诉它只匹配 lazy 如果它向前看并看到不符合匹配条件的东西,它就会停止。显然,我在这里没有得到任何东西。
这当然行不通,而是找到要匹配的整个文本。我也试过使用负字符集,但也没有骰子。 我要匹配的是一个 POINT 到 ENDPOINT 块,前提是它有我想要的字符串“Max”,我想取消在找到“Max”之前以 "ENDPOINT" 终止的块的资格.
EDIT1:您可以假设在显示的代码段前后会有更多这样的块。我特别想获取其中包含我的目标字符串的块(这样我就可以用另一个替换它,或者删除它)。其他块可能有也可能没有目标字符串,但如果有,我想分别匹配每个块,而不是单个匹配。
^\s*POINT=\d+\s*$ # A line matching to the word POINT,
# followed by the character '=' and one
# or more decimal digits surrounded by
# whitespace characters.
(?:\r?\n)+ # A zero or one character '\r' before the
# character '\n'. This sequence may be
# repeated one or more times.
(?: # Zero or more lines that is not
^(?! # matched with the ENDPOINT word
\s*(?:POINT=\d+| # or the word POINT followed by
ENDPOINT)\s*$ # the character '=' and zero or
).*$ # more decimal digits surrounded
(?:\r?\n)+ # by whitespace characters.
)*
# A line that starts with one or more
# characters that are not equal to the
^[^=]+=.*Max.*$ # '=' character, followed by the '='
# character, and finally the word Max
# followed by zero or more characters.
(?:\r?\n)+
(?:
^(?!
\s*(?:POINT=\d+|ENDPOINT)\s*$
).*$
(?:\r?\n)+
)*
^\s*ENDPOINT\s*$ # A line matching to the word ENDPOINT,
# surrounded by whitespace characters.
我有一个虚拟乐器文件,其中的块可能包含如下所示的任意属性组合:
POINT=69
Name="M_Frequency Min" Type=ANALOG
Units="Hz"
Archive="AVERAGE" Priority=9999 Latch=0
HysEnable=0 HysVal=0.00000
Bit="0"
Category="Meter"
IsCustom=1
Interval=0
Accumulated=0
DisplayOrder=1
ENDPOINT
POINT=70
Name="M_Voltage Phase A-N Max" Type=ANALOG
Units="Volts"
Archive="AVERAGE" Priority=9999 Latch=0
HysEnable=0 HysVal=0.000000
CritHiEnable=0 CritHiLimit=0.000000
CritLoEnable=0 CritLoLimit=0.000000
CautHiEnable=0 CautHiLimit=0.000000
CautLoEnable=0 CautLoLimit=0.000000
Desc="Voltage Phase A-N Max"
RW=READ
Register="9000"
RegType="H"
DataType="F"
Accumulated=0
DisplayOrder=1
ENDPOINT
说,我只想使用 POINT=[0-9]*(?s)(.*?)(?!ENDPOINT)(\sMax)(.*?)ENDPOINT
我的想法是,如果我将我的 dot-star 设置为也匹配换行符,然后告诉它只匹配 lazy 如果它向前看并看到不符合匹配条件的东西,它就会停止。显然,我在这里没有得到任何东西。
这当然行不通,而是找到要匹配的整个文本。我也试过使用负字符集,但也没有骰子。 我要匹配的是一个 POINT 到 ENDPOINT 块,前提是它有我想要的字符串“Max”,我想取消在找到“Max”之前以 "ENDPOINT" 终止的块的资格.
EDIT1:您可以假设在显示的代码段前后会有更多这样的块。我特别想获取其中包含我的目标字符串的块(这样我就可以用另一个替换它,或者删除它)。其他块可能有也可能没有目标字符串,但如果有,我想分别匹配每个块,而不是单个匹配。
^\s*POINT=\d+\s*$ # A line matching to the word POINT,
# followed by the character '=' and one
# or more decimal digits surrounded by
# whitespace characters.
(?:\r?\n)+ # A zero or one character '\r' before the
# character '\n'. This sequence may be
# repeated one or more times.
(?: # Zero or more lines that is not
^(?! # matched with the ENDPOINT word
\s*(?:POINT=\d+| # or the word POINT followed by
ENDPOINT)\s*$ # the character '=' and zero or
).*$ # more decimal digits surrounded
(?:\r?\n)+ # by whitespace characters.
)*
# A line that starts with one or more
# characters that are not equal to the
^[^=]+=.*Max.*$ # '=' character, followed by the '='
# character, and finally the word Max
# followed by zero or more characters.
(?:\r?\n)+
(?:
^(?!
\s*(?:POINT=\d+|ENDPOINT)\s*$
).*$
(?:\r?\n)+
)*
^\s*ENDPOINT\s*$ # A line matching to the word ENDPOINT,
# surrounded by whitespace characters.