如何使用 preg_replace 在第 3 段和第 4 段之间插入文本字符串？

Question

我正在尝试弄清楚如何在 Wordpress post 中创建一个名为 'pullquote' 的普通报纸设备。（但这不是严格意义上的 Wordpress 问题；它更像是一个通用的 Regex 问题。）我有一个标签来包围 post 中的文本。我想复制标签之间的文本（我知道该怎么做）并将其插入 post.

中 p 标签的第 3 个和第 4 个实例之间

下面的函数找到文本并去除标签，但只是将匹配的文本添加到开头。我需要帮助定位第 3/4 段

或者...也许我在想这个问题。也许有一些方法可以像 jQuery nth-child?

那样定位元素

Post:

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<p>And here is a 4th paragraph.</p>

想要的结果

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<blockquote class="pullquote">Tatort or Bukow & Konig</blockquote>
<p>And here is a 4th paragraph.</p>

到目前为止，这是我的代码：

function jchwebdev_pullquote( $content ) {
    $newcontent = $content;
    $replacement = '';
    $matches = array();
    $pattern = "~\[callout\](.*?)\[/callout\]~s";
    // strip out 'shortcode'
    $newcontent = preg_replace($pattern, $replacement, $content);
    if( preg_match($pattern, $content, $matches)) {
      // now have formatted pullquote 
      $pullquote = '<blockquote class="pullquote">' .$matches[1] . '</blockquote>';
      // now how do I target and insert $pullquote
      // between 3rd and 4th paragraph?
      preg_replace(rd_4th_pattern, rd_4th_replacement,
      $newcontent);
      return $newcontent;
    }
    return $content;    
}
add_filter( 'the_content' , 'jchwebdev_pullquote');

编辑：我想将我的问题修改为更具体一点的 Wordpress。 Wordpress 实际上将换行符转换为

个字符。大多数 Wordpress post 甚至不使用显式 'p' 标签，因为不需要它们。到目前为止，解决方案的问题是它们似乎去掉了换行符，所以如果 post（源文本）有换行符，它看起来很奇怪。

典型的现实世界 Wordpress post:

If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].

If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.

And here is a 3rd paragraph.


And here is a 5th paragraph.

Wordpress 是这样呈现的：

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<p></p>
<p>And here is a 5th paragraph.</p>

所以在一个完美的世界里，我想 'typical real world post' 并让 preg_replace 将其渲染为：

If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.

If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.

And here is a 3rd paragraph.

<blockquote class="callout">Tatort or Bukow & Konig</blockquote>

And here is a 5th paragraph.

...然后 Wordpress 将呈现为：

<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of Tatort or Bukow & Konig.</p>
<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>
<p>And here is a 3rd paragraph.</p>
<blockquote class="callout">Tatort or Bukow & Konig</blockquote>
<p>And here is a 5th paragraph.</p>

也许这离题太远了，我应该在 Wordpress 论坛中重新 post，但我-认为-我需要的是改变 preg_replace 以使用换行符作为分隔符而不是

并弄清楚如何 - 不 - 从返回的字符串中删除那些换行符。

感谢迄今为止的所有帮助！

Answer 1

您可以在一个 preg_replace 函数中完成此操作。

$re = "~^(?:(?!/p).)*<p>(?:(?!/p).)*\[callout\](.*?)\[/callout\].*?</p>(?:[^<>]*<p>.*?</p>){2}[^<]*\K~s";
$str = "<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>\n<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>\n<p>And here is a 3rd paragraph.</p>\n<p>And here is a 4th paragraph.</p>";
$subst = "<blockquote class=\"pullquote\"></blockquote>\n";
$result = preg_replace($re, $subst, $str);
echo $result;

DEMO

Code in eval

Answer 2

简单地使用(.*?</p>){3}\K和s修饰符，你可以实现你想要的：

preg_replace("@(.*?</p>){3}\K@s", $pullquote, $content);

我对您的功能进行了一些更改以使其正常工作：

function jchwebdev_pullquote( $content )
{
    $pattern = "~\[callout\](.*?)\[/callout\]~s";
    if(preg_match($pattern, $content, $matches))
    {
      $content = preg_replace($pattern, '', $content);
      $pullquote = '<blockquote class="pullquote">' .$matches[1] . '</blockquote>';
      $content = preg_replace("@(.*?</p>){3}\K@s", $pullquote, $content);
      return $content;
    }
    return $content;    
}

Regex live demo

PHP live demo

更新#1

优化：使用单个 preg_replace 以避免应用多个模式：

function jchwebdev_pullquote( $content )
{
    $pattern = "\[callout\](.*?)\[/callout\]";
    if(preg_match("@(?s)$pattern@", $content, $matches))
    {
      $content = preg_replace("@(?s)($pattern)((.*?</p>){3})@", '<blockquote class="pullquote"></blockquote>', $content);
      return $content;
    }
    return $content;
}

PHP live demo

Answer 3

如果要使用PHPHTML/XML解析，请参考How do you parse and process HTML/XML in PHP?。

对于正则表达式解决方案，这是一个正则表达式解决方案：

查找： (?s)((?:<p>.*?<\/p>\s*){3})

此正则表达式将只捕获前 3 个 <p> 标记，然后在它们之后添加一个节点。

替换： <blockquote class="pullquote">Tatort or Bukow & Konig</blockquote>\n

代码：

$re = "/(?s)((?:<p>.*?<\/p>\s*){3})/"; 
$str = "<p>If you wanna improve yer German, don't try to read Heine or some elevated crap... watch old episodes of [callout]Tatort or Bukow & Konig[/callout].</p>\n<p>If I were teaching a music appreciation I wouldn't teach Beethoven. I'd teach Stamitz and average composers.</p>\n<p>And here is a 3rd paragraph.</p>\n<p>And here is a 4th paragraph.</p>"; 
$subst = "<blockquote class=\"pullquote\">Tatort or Bukow & Konig</blockquote>\n"; 
$result = preg_replace($re, $subst, $str, 1);

Demo is here.

如何使用 preg_replace 在第 3 段和第 4 段之间插入文本字符串？

How do I use preg_replace to insert text string between 3rd and 4th paragraph?

regex

wordpress

preg-replace

更新#1