在 perl 单行代码中转义和捕获的正确语法是什么?

What's the correct syntax for escapes and captures in a perl one-liner?

我正在尝试使用 pandoc 将乳胶文件(由 doxygen 自动生成)转换为 .docx 格式。我遇到了一个错误,可能是在 doxygen 中,它允许应该转义的 一些 字符(_%)在 DoxyCode 乳胶环境。一些下划线出现在文件名中,并且在大括号内。那些不应该逃脱。

我写了一个 perl 单行代码来定位任何 不在 之间的下划线或百分号,并将它们替换为反斜杠后跟相同的字符:

perl -i -pe 's/(?<!\)([_%])(?![^{]+})/\/g' test.tex

这按预期工作。但是,然后我发现某些文件在 DoxyCode 环境中包含例如大括号内的初始化列表,以及一些包含下划线的变量。所以我需要一个 perl 脚本,它可以识别下划线或百分比何时在 \begin{DoxyCode}\end{DoxyCode} 之间,如果有 none.

则插入反斜杠

此命令的正则表达式有效;参见 https://regex101.com/r/gsQm2L/2

虽然它只抓取第一场比赛。我希望 perl 可以抓住其他比赛,但我可能弄错了。

我的命令是

perl -i -pe 's/(?<=begin\{DoxyCode})([\s\S]+?[^\])([_%])([\s\S]+?)(?=end\{DoxyCode})/\/g' test.tex

但它无法进行任何更改。 (我尝试不转义左大括号,但出现错误:Unescaped left brace in regex is deprecated, passed through in regex; 等)我无法判断它是找不到匹配项还是未能替换它们,因为我的捕获语法不正确。

对于第一个和第二个例子,test.tex的原始内容如下:

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an_undersc_ore and                                                                
18 an escaped\_underscore. Plus another unescaped_unders_core                                                        
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a %percent sign                                                 
21 that is unescaped.                                                                                                
\end{DoxyCode}                                                                                                       

Here is some other stuff that may contain \index{things_not_to_be_escaped}.                                          

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an_underscore and                                                                 
18 an escaped\_underscore. Plus another unescaped_underscore                                                         
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a \%percent sign                                                
21 that is escaped.                                                                                                  
\end{DoxyCode}     

test.tex 的所需内容,在 运行 perl 命令之后,将是以下内容:

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an\_undersc\_ore and                                                                
18 an escaped\_underscore. Plus another unescaped\_unders\_core                                                        
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a \%percent sign                                                 
21 that is unescaped.                                                                                                
\end{DoxyCode}                                                                                                       

Here is some other stuff that may contain \index{things_not_to_be_escaped}.                                          

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an\_underscore and                                                                 
18 an escaped\_underscore. Plus another unescaped\_underscore                                                         
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a \%percent sign                                                
21 that is escaped.                                                                                                  
\end{DoxyCode}     

为什么我的 perl 单行程序失败了?我如何获得所需的输出?我绝不是 perl 或 regex 专家,所以我欢迎对其他错误的反馈。

如果相关,我正在研究 debian stretch,并且 perl --version returns

This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi
(with 85 registered patches, see perl -V for more detail)

很简单,虽然 "right" 方法是使用正则表达式解析器,但它仍然很简单,您可以使用一个衬里来完成。关键是进行两阶段替换。我为文字反斜杠 (\) 添加了一个用例,这些反斜杠没有开始对 _ 或 % 进行转义。如果可能有其他嵌入式 {},则可以使用相同的范例排除它们。

$text = <<'EOF';
\begin{DoxyCode}
17 This is some code that contains an_undersc_ore and
18 an escaped\_underscore. Plus another unescaped_unders_core
19 for good measure. A literal \ and a literal \_.
20 As if that was not "bad" enough, it also contains a %percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an_underscore and
18 an escaped\_underscore. Plus another unescaped_underscore
19 for good measure. A literal \%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}
EOF

print "before:\n$text\n\n";
$text =~ s{\Q\begin{DoxyCode}\E\K(.+?)(\Q\end{DoxyCode}\E)}{
    my($t,$e) = (,);
    $t =~ s{(\\ | \?[_%])}{1==length  ? "\" : }egsx; "$t$e";
}egs;
print "after:\n$text\n";

输出:

before:
\begin{DoxyCode}
17 This is some code that contains an_undersc_ore and
18 an escaped\_underscore. Plus another unescaped_unders_core
19 for good measure. A literal \ and a literal \_.
20 As if that was not "bad" enough, it also contains a %percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an_underscore and
18 an escaped\_underscore. Plus another unescaped_underscore
19 for good measure. A literal \%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}


after:
\begin{DoxyCode}
17 This is some code that contains an\_undersc\_ore and
18 an escaped\_underscore. Plus another unescaped\_unders\_core
19 for good measure. A literal \ and a literal \\_.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an\_underscore and
18 an escaped\_underscore. Plus another unescaped\_underscore
19 for good measure. A literal \\%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}

另请阅读http://perldoc.perl.org/perlre.html and http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators。特别注意 \G 断言和 /gc 标志。这就是您为此任务编写合适的解析器的方式。

HTH