什么正则表达式可以在一个短语中准确捕获 2 个 'words'？

Question

我正在尝试捕获字符串中的常量单词。该常数是：

一句话
后跟一个分隔符（空格、点、破折号或下划线）
另一个词
然后是分隔符（参见#2）或行尾或字符串。

举个例子，假设我在以下字符串中寻找 "Bob 1"：

Hello, I'm Bob 1 --> Should capture Bob 1
Hello, I'm Bob 11 --> Should capture nothing (Bob 1 is not at the end or followed by a separator)
Hey, it's Bob-1 over there --> Should capture Bob-1
Hey, it's Bob - 1 over there --> Should capture nothing (Bob should be followed only by one separator not 3 like here)
Bob.1 --> Should capture Bob.1
Bob_1 rules! --> Should capture Bob_1

我有一个最有效的正则表达式：

/Bob[\s._-]1[\s._-]/ig

在第二个列表中，我不知道如何在可能的字符中添加字符串的结尾...仅在下面的现场演示中的最后一行结束，这应该是一个匹配项，但不是' t 被捕获了。

参见live demo。

我使用 pcre（在 PHP）。

Answer 1

Which ends in only the last line in the live demo below that should be a match and that isn't captured.

为此你需要一个积极的展望。

正则表达式： Bob[\s._-]1(?=[\s._-])

(?=[\s._-]) 只会寻找给定的字符 class 而不会 match/capture 它。

Regex101 Demo

Answer 2

我没有使用 PHP，但我使用以下匹配项：

\bBob[\s.\-_]1\b

它利用了 \b 来匹配单词边界。我发现我必须转义方括号内的破折号，这不是您正在做的事情，但这可能是我们使用的正则表达式引擎之间的差异。

Answer 3

In the second list I don't know how to add the end of the string in the possible characters.

您可以将此正则表达式与锚 $ 一起使用以断言字符串结尾：

/\bBob[\s._-]1(?:[\s._-]|$)/m

或者，如果您不想匹配第二个单词之后的下一个字符，则使用前瞻：

/\bBob[\s._-]1(?=[\s._-]|$)/m

([\s._-]|$) 将断言存在给定的（空格、点、下划线、连字符之一）字符或行尾 $.

在Bob之前添加\b更安全，以匹配确切的单词Bob，避免匹配HelloBob

RegEx Demo

Answer 4

这个有效

https://regex101.com/r/ezikuP/2

(?<!\S)Bob[\s._-]1(?![^\s._-])

Formatted

 (?<! \S )               # Whitespace boundary
 Bob                     # Word 1
 [\s._-]                 # Special seperator
 1                       # Word 2
 (?! [^\s._-] )          # Special seperator boundary

什么正则表达式可以在一个短语中准确捕获 2 个 'words'？

What regex can capture 2 exact 'words' in a phrase?

regex

pcre