如何在没有后视的情况下匹配'+abc'而不是'++abc'？

Question

一句话类似：

Lorem ipsum +dolor ++sit amet.

我想匹配 +dolor 但不匹配 ++sit。我可以通过回顾来做到这一点，但由于 JavaScript 不支持它，所以我正在努力为它构建一个模式。

到目前为止，我已经尝试过：

(?:\+(.+?))(?=[\s\.!\!]) - but it matches both words
(?:\+{1}(.+?))(?=[\s\.!\!]) - the same here - both words are matched

令我惊讶的是这样的模式：

(?=\s)(?:\+(.+?))(?=[\s\.!\!])

不匹配任何内容。我想我可以欺骗它并在 + 符号之前使用 \s 或稍后也使用 ^ 但它似乎不像那样工作。

编辑 - 背景信息：

这不一定是问题的一部分，但有时知道这一切有什么好处是件好事，因此可以澄清您的一些 questions/comments 简短解释：

任何顺序的任何单词都可以用 + 或 ++
每个单词及其标记稍后将被 <span> 替换
像lorem+ipsum这样的情况被认为是无效的，因为它就像拆分一个词(ro+om)或写两个单词一起作为一个单词 (myroom) 所以无论如何都必须更正它（模式可以匹配这个但它不是错误）但是它至少应该匹配正常情况如上例
我使用像 (?=[\s\.!\!]) 这样的前瞻性，这样我就可以匹配任何语言中的单词，而不仅仅是 \w 的字符

Answer 1

只需尝试使用以下正则表达式：

(^|\s)\+\w+

Answer 2

一种方法是匹配一个额外的字符并忽略它（通过将匹配的相关部分放入捕获组）：

(?:^|[^+])(\+[^\s+.!]+)

但是，如果潜在的匹配项可能彼此直接相邻，则此方法无效。

测试一下live on regex101.com。

解释：

(?:         # Match (but don't capture)
 ^          # the position at the start of the string
|           # or
 [^+]       # any character except +.
)           # End of group
(           # Match (and capture in group 1)
 \+         # a + character
 [^\s+.!]+  # one or more characters except [+.!] or whitespace.
)           # End of group

Answer 3

\+\+|(\+\S+)

从捕获组 1 中获取内容。正则表达式使用描述的技巧 in this answer。

Demo on regex101

var re = /\+\+|(\+\S+)/g;
var str = 'Lorem ipsum +dolor ++sit ame';
var m;
var o = [];

while ((m = re.exec(str)) != null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }

    if (m[1] != null) {
        o.push(m[1]);
    }

}

如果您有类似 +++donor 的输入，请使用：

\+\++|(\+\S+)

Answer 4

以下正则表达式似乎对我有用：

var re = / (\+[a-zA-Z0-9]+)/  // Note the space after the '/'

演示

https://www.regex101.com/r/uQ3wE7/1

Answer 5

我想这就是你需要的。

(?:^|\s)(\+[^+\s.!]*)(?=[\s.!])

如何在没有后视的情况下匹配'+abc'而不是'++abc'？

How to match '+abc' but not '++abc' without lookbehind?

javascript

regex