如何避免正则表达式中的重复

Question

假设我有以下正则表达式：a+ a+

a+ 部分需要匹配两次但不幸的是它是重复的，这意味着对于 a+ 的每次更改，我实际上需要进行两次更改。

如何在不重复的情况下重写这样的正则表达式，以便更容易阅读和维护？

PS：a+其实有点复杂。

Answer 1

您可以尝试使用 子例程:

(a+) (?1)

Reference 1, Reference 2

Perl 5.10, PCRE 4.0, and Ruby 1.9 support regular expression subroutine calls. These are very similar to regular expression recursion. Instead of matching the entire regular expression again, a subroutine call only matches the regular expression inside a capturing group. You can make a subroutine call to any capturing group from anywhere in the regex. If you place a call inside the group that it calls, you'll have a recursive capturing group.

Answer 2

您可以只将部分正则表达式保留在您正在使用的语言的字符串变量中，然后像这样从中组成完整的正则表达式：

var complexPart = 'a+';    
var completeRegexp = new RegExp(complexPart + ' ' + complexPart);

如果您只需要在您无法控制的某些应用程序中使用正则表达式，那么子例程是一种可行的方法：http://www.rexegg.com/regex-disambiguation.html#subroutines如果应用程序使用的引擎支持它们

(a+) (?1)

Answer 3

我不知道你到底想要什么，但是你可以像这样匹配序列N次：

(a+){N}

小于等于N次，M到N次之间，或者大于N次：

(a+){,N}
(a+){N,M}
(a+){N,}

Answer 4

你可以这样做

(?:a+(?: |$)){2}

但是，这也会匹配尾随 space 的字符串，因此您可能需要添加一个回顾来防止这种情况发生：

(?:a+(?: |$)){2}(?<! )

请注意，您避免了复制 a+ 模式，但作为交换，您复制了分隔 space </code>.</p> <p>另请注意，如果您的模式可以以 space 结尾，这将不起作用，例如<code>[a ]+ [a ]+.

如何避免正则表达式中的重复

How to avoid duplication in regex

regex