正则表达式：找到所有评论但不在引号内

Question

我想到了一个例子，我可以用它来学好它。我设计了一种我想解析和解释的“脚本语言”（作为字符串）。

PS：是的，有点像LINQ，但这只是巧合。

我首先想到的是，我想删除所有评论，因为这些不应该被解释。

我只寻找像这样的评论：/*...*/ 和 //...\n

但是，这些当然不应该出现在引号内："..." 和 '...'

但是如何使用 RegEx 查找不在引号内的评论？

字符串：

  //get means only read, but not to mutate data
  Get(BooksWithAuthors)
    //default queries via mycel
    .Query()
      //junction table to pair books and authors
      .From(BookAuthor.As(BA))
      //main table for books
      .Join(left: Books.As(B) => B.Id == BA.BookId)
      //main table for authors
      .Join(left: Authors.As(A) => A.Id == BA.AuthorId)
      //groups by column, body allows to restore data (restructuring)
      .GroupBy(B.Id, => B.Authors.Add(A))
      //ignore still registerd data objects for the response
      .SelectIgnore(BA)
      //or select only that fields or objects you want to response
      .Select(B)
      .Foo("//wrong-comment-inside-quotes")
      .Foo('//wrong-comment-inside-single-quotes')
      .Foo('some /*wrong-comment*/ inside')
  ;

  //get means only read, but not to mutate data
  Get(BooksWithAuthorsByMethod)
    //using individual backside methods (created by own)
    .GetBooksWithAuthors(id:6, filter:{key:'minAuthorAge', value:17})
  ;

  /*
    comments
    "over"
    'multiply
    lines' //with wrong comments inside
  *\

正则表达式：

.*[^'"].*([\/]{2}.*[\r\n|\r|\n]).*[^'"].*

(https://regex101.com/r/zPzBFj/1)

是的，我只用 // 试过，但不是每一个事件都被发现，它也匹配引号内的评论。也许 ?! 不是正确的方法。但是我该怎么做呢？

关于这个例子，我肯定还有一两个问题。但正如我所说，我仍在学习 RegEx，所以一步一步...

Answer 1

这 returns 您在示例中寻找的内容，如果您发现任何极端情况，请告诉我。您必须 post-process 根据它是评论还是引用字符串来匹配。

(?:(?:(\/)(\*)|(["'])).*?(?:|))|(?:\/\/[^\n]+)

https://regex101.com/r/uqx1cJ/1

Answer 2

如果用正则表达式匹配字符串

/'.*?'|".*?"|(\/\/[^\r\n]*|\/\*.*?\*\/)/gs

评论将保存到捕获组 1。这个想法是匹配但不捕获您不想要的内容，匹配并捕获您想要的内容。不要关注未捕获的匹配项。

没有 DOTALL 标志 (/s) 句点匹配行终止符以外的所有字符；使用该标志设置句点匹配所有字符，包括行终止符。

Demo

在演示中 link 未捕获的匹配项（不是评论，所以忽略）显示为蓝色，而捕获的匹配项（评论）显示为绿色。

正则表达式可以分解如下

'.*?'       # match a single-quote followed by >= 0 chars, lazily,
            # followed by a single-quote
|           # or
".*?"       # match a double-quote followed by >= 0 chars, lazily,
            # followed by a double-quote
|           # or
(           # begin capture group 1
  \/\/      # match '//'
  [^\r\n]*  # match >= 0 chars other than line terminators
  |         # or
  \/\*      # match '/*'
    .*?     # match >= 0 chars, lazily
    \*\/    # match '*/'
)           # end capture group 1

这是一个如何工作的例子。假设字符串如下。

A dog "is // a\nman's" /* best */ 'friend /* so it */ is' // said

正则表达式引擎执行以下步骤。

匹配失败A。
在A之后匹配失败，然后匹配d、o、g和失败。
匹配但不捕获 "is // a\nman's".¹
匹配失败。
匹配和捕获评论/* best */.
匹配失败。
匹配但不捕获 'friend /* so it */ is'。
匹配失败。
匹配和捕获评论// said

^{1.在这次匹配之后，正则表达式引擎的字符串指针位于刚刚匹配的（最后一个）double-quote 和后面的 space.}

之间

正则表达式：找到所有评论但不在引号内

RegEx: found all comments but not within quotation marks

.net

javascript

c#

regex