如何忽略部分文本并在剩余部分进行搜索和替换？

Question

在文本文件中执行正则表达式查找和替换时，我想跳过并忽略文本的某些部分。也就是说，文本的某些部分应该被排除在搜索之外，而只在其余部分进行搜索和替换。标准是：

(1) START 和 END 之间的任何内容都应从搜索和替换中排除。 START 可能在行首，也可能不在行首； END 可能在行尾，也可能不在行尾；一对 START & END 可能跨越多行；

(2) 内联注释 // 中的任何内容都应忽略； // 可能在也可能不在行首；

(3) . 之后的第一个单词应该被忽略； . 可能在行首，也可能不在行首；该词可以紧跟在 . 之后，也可以用空格、换行符、制表符分隔它们。

示例代码：

#!/usr/bin/env perl
use strict;
use warnings;

$/ = undef;

#iterate the DATA filehandle
while (<DATA>) {
    # This one replaces ALL occurrences of pattern.
    s/old/new/gs;

    # How do I skip the unwanted segments and do the replace?
    #print all
    print;
}

##inlined data filehandle for testing. 
__DATA__
xx START xx old xx END xx   --> ignore
xx old xx                   --> REPLACE !
START xx old                --> ignore
      xx old xx END         --> ignore
      xx old xx             --> REPLACE !
// xx old                   --> ignore
xx // xx old                --> ignore
xx . old old xx             --> ignore first one, replace second one
.
  old                       --> ignore
  (old) xx                  --> REPLACE !
xx old xx                   --> REPLACE !

预期输出为：

xx START xx old xx END xx   --> ignore
xx new xx                   --> REPLACE !
START xx old                --> ignore
      xx old xx END         --> ignore
      xx new xx             --> REPLACE !
// xx old                   --> ignore
xx // xx old                --> ignore
xx . old new xx             --> ignore first one, replace second one
.
  old                       --> ignore
  (new) xx                  --> REPLACE !
xx new xx                   --> REPLACE !

谁能帮我解决正则表达式问题？几个小时前我 post 提出了一个类似的问题，但是 post 充满了歧义并且排除了明确的答案。希望这个 post 可能是一个 "good" & "clear" 问题。

Answer 1

您可以使用 (*SKIP)(*F) verbs 跳过某些内容。

(?:(?s:START.*?END)|\/\/.*|\.\s*\w+\b)(*SKIP)(*F)|old

它是这样工作的：(?:part 1 to skip|part 2 to skip|...)(*SKIP)(*F) | part to match

(?: 打开一个 non capture group for alternation (?s: with s flag 使点匹配换行符
\w 匹配 word character [A-Za-z0-9_]
\b 匹配 word boundary

See demo at regex101

Answer 2

您需要在结构上更加精确（即何时应忽略旧的），但对于您的示例，以下正则表达式将起作用 (demo on regex101.com):

~                                       # delimiter
    (?s)(?:START).*?(?:END)(?-s)|       # look for START-END in single-line mode OR
    //.+|                               # everything after two forward slashes
    \.\sold|                             # the word old after a dot and space OR
    ^\s+old                             # old after spaces at the beginning of the line
    (*SKIP)(*FAIL)|                     # all these matches shall fail
    \b(old)\b                           # this one is to be kept
~xg                                     # verbose and multiline modifier

要了解更多关于这个概念的信息，请查看这个很棒的网站 - rexegg.com。

Answer 3

感谢@bobblebubble 和@Jan 的宝贵贡献，并根据他们回复中的 Perl 代码，我最终学会了使用 (*SKIP)(*F) 来跳过、跳过或忽略不需要的部分。最终代码为：

#!/usr/bin/env perl
use strict;
use warnings;

$/ = undef;

#iterate the DATA filehandle
while (<DATA>) {
    # This one replaces ALL occurrences of pattern.
#    s/old/new/gs;

    # How to skip the unwanted segments and do the replace:
    # Both are good.
    #s/(?:(?:START.*?END)|\/\/.*?\n|\.\s*\w+\b)(*SKIP)(*F)|old/new/gs;
    s/(?:(?s:START.*?END)|\/\/.*|\.\s*\w+\b)(*SKIP)(*F)|old/new/g;
    #print all
    print;
}

##inlined data filehandle for testing. 
__DATA__
xx START xx old xx END xx   --> ignore
xx old xx                   --> REPLACE !
START xx old                --> ignore
      xx old xx END         --> ignore
      xx old xx             --> REPLACE !
// xx old                   --> ignore
xx // xx old                --> ignore
xx . old old xx             --> ignore first one, replace second one
.
  old                       --> ignore
  (old) xx                  --> REPLACE !
xx old xx                   --> REPLACE !

再次感谢 bobble bubble 和 Jan.

如何忽略部分文本并在剩余部分进行搜索和替换？

How to ignore parts of the text and do search-and-replace in the remaining part?

regex

perl

conditional

replace