当字符串在注释中时，如何改进此正则表达式以使其不匹配

Question

鉴于此示例文本：

<input type="text" value="<? print(variable); ?>">

<? /*<br><br><small>Data:</small>

<input type="text" value="<? print(data); ?>"> */ ?>

<textarea><? print(yuppy); ?></textarea>

要捕获 和 ?> 中的所有内容（一个一个），我使用：

/<\?\s*([\s\S]+?)\s*\?>/g

这个正则表达式的问题是它甚至会匹配 和 ?> 在 / * */ 或 // （评论） 并且这不是所需的行为。

当这些字符串不在评论中时，我如何改进该正则表达式以正确匹配它们？

明确地说，正确的匹配应该是：

1) print(variable); 2) /*<br><br><small>Data:</small> <input type="text" value="<? print(data); ?>"> */ 3) print(yuppy);

而不是我的正则表达式，第二个匹配项是：

/*<br><br><small>Data:</small> <input type="text" value="<? print(data);

更新：

Josh Crozier 的答案几乎不错，但有点错误：

他的正则表达式 <\?\s*((?:.*\/\*[\s\S]+\*\/.*)|(?:[\s\S]+?))\s*\?> 错误地匹配 https://regex101.com/r/oL5iV0/2:

<? /* hello */ ?> html <? /* world*/ ?>

甚至 https://regex101.com/r/qW7mR7/1:

<input type="text" value="<? print(code); ?>"> <? /* */ ?>

在最新的例子中，只有在有换行符的情况下才能正确匹配。在第一个示例中，即使有换行符也无法正确匹配

Answer 1

您可以使用替代方法 ((?:.*\/\*[\s\S]+\*\/.*)|(?:[\s\S]+?)) 来涵盖这两种情况。

Example Here

/<\?\s*((?:.*\/\*[\s\S]+\*\/.*)|(?:[\s\S]+?))\s*\?>/g

它将尝试匹配注释之间和注释周围的所有内容 (.*\/\*[\s\S]+\*\/.*)，或者匹配 ([\s\S]+?) 您最初拥有的内容。

输出：

1) print(variable);
2) /*<br><br><small>Data:</small>

<input type="text" value="<? print(data); ?>"> */
3) print(yuppy);

Answer 2

您可以使用此模式（删除空格和注释以使其与 javascript 一起使用）：

<\?  # opening tag
[^?\/]* # all that is not a ? or a /
(?:
    \/ # a slash:
    (?:
        (?![\/*]) [^?\/]*  # - not followed by a slash or a *
      |
        \/.*(?:\n[^?\/]*)? # - that starts a single line comment
      | 
        \*                 # - that starts a multiline comment
        [^*]* (?:\*+(?!\/)[^*]*)* # (comment content)
        (?:\*\/ [^?\/]* | $)      # */ is optional
    )
  |
  \?(?!>) [^?\/]* # a ? not followed by a >
)*
(?:\?>|$) # optional closing tag ?>

demo

请注意，此模式不会导致灾难性的回溯，因为在 <\? 之后所有内容都是可选的，特别是结束标记 ?> 和多行注释的结尾 */。

当字符串在注释中时，如何改进此正则表达式以使其不匹配

How to improve this regex to not matches when the strings are in comments

javascript

regex

string

text

comments