用于隔离尾随嵌套引号标记的正则表达式

Question

我使用的是一些旧的 PHP 论坛软件，这些软件多年来一直在升级，但在这个过程中，有一些 post 的底部包含未回应的 [QUOTE] 块post.

我正在想办法运行 a PHP preg_replace 正则表达式。我只想删除出现在 post 内容下方的 QUOTE 标签（也可能包含嵌套的引号标签）。

例如，post 保持原样可能如下所示：

Here is the example post text

[QUOTE]
This is an appropriate quote
[/QUOTE]

Here is more post content

在看起来像下面的 post 上，我想尝试删除最后一个引号块：

Here is the example post text

[QUOTE]
This is an appropriate quote
[/QUOTE]

Here is more post content

[QUOTE]
This is an unnecessary quote, as it's below all of the post text
   [QUOTE]
   Here's an unnecessary nested quote, just to confuse things.
   [/QUOTE]
[/QUOTE]

我花了几个小时试图想出一个正则表达式来捕获最后一种类型的引号块，但无济于事。我知道我需要以以下结尾，因为结束引号总是在 post:

的末尾有这个标签

\[\/QUOTE\]$

有没有一种方法可以捕获正则表达式中的整个最终 QUOTE 块，包括任何可能的嵌套引号？到目前为止我尝试过的任何东西都会尝试匹配嵌套的开始引号标签，以及最终的结束标签（而不是匹配对）。

Answer 1

你可以像这样匹配字符串末尾的嵌套bb代码。

(?is)\[quote\]((?&core)|)\[/quote\]$(?(DEFINE)(?<core>(?>(?&content)|\[quote\](?:(?&core)|)\[/quote\])+)(?<content>(?>(?!\[/?quote\]).)+))

演示：https://regex101.com/r/uFPyXX/2

 (?is)

 \[quote\]                          # Start-Delimiter
 (                                  # (1), The CORE
      (?&core) 
   |  
 )
 \[/quote\]                         # End-Delimiter

 $                                  # End of string

 # ///////////////////////
 # // Subroutines
 # // ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                \[quote\]
                # recurse core
                (?:
                     (?&core)                           # Core
                  |                                   # or, nothing
                )
                \[/quote\]
           )+
      )

      # content 
      (?<content>
           (?>
                (?!
                     \[/?quote\]
                )
                . 
           )+
      )

 )

请注意，如果您需要限定现有报价在此之前
让我知道，我会给你一个 mod。

Answer 2

您可能想使用递归，但 anchored 方法：

(\[QUOTE[^][]*\]
(?:[^][]++|(?1))++
\[/QUOTE\])
\Z

参见a demo on regex101.com。这里只匹配最后的引号块(\Z)。

用于隔离尾随嵌套引号标记的正则表达式

Regex to isolate trailing nested quote tags

php

regex

preg-replace