preg_replace 个标题强

Question

我试图用正则表达式替换文本中的所有标题（h1、h2、h3 等），但它只替换了第一个开始标记和最后一个标记。

这是我的代码：

<?php
$regex = '/<h(?:[\d]{1})(?:[^>]*)>([^<].*)<\/h(?:[\d]{1})>/mi';
$str = '<h1 class="text-align-center" style="font-size:22px;margin-top:0px;margin-bottom:0px;color:rgb(0,0,0);font-family:IntroBold, sans-serif;line-height:1.5;letter-spacing:0px;font-weight:700;text-align:center;">You should be&nbsp;confident solving wicked problems in a hybrid role between strategy, research, design and business&nbsp;through a discovery driven approach.&nbsp;</h1><p></p><h2 style="margin-top:0px;margin-bottom:.5em;font-family:IntroBold, sans-serif;font-size:19px;line-height:1em;text-transform:uppercase;letter-spacing:1px;font-weight:700;"><strong>KEY RESPONSIBILITIES</strong></h2>';
echo preg_replace($regex, '<strong></strong>', $str);

结果是<strong>[...]</h1><p></p><h2...>[...]</strong>，但当然是错误的。

Answer 1

显然正则表达式不是 HTML 解析的完美解决方案，如果你想要一个更安全的解决方案，你应该找到一个 HTML 解析器并按照这种方式进行。

然而，这个正则表达式将完成一半体面的工作，并且适用于提供的示例：

/<h\d.*?>(.*?)<\/h\d>/ims

Proof.

Answer 2

您可以使用替代方法 simple_dom_html。

你可以用它做很多事情，包括你的担忧。,

这里是你如何实现的：

$dom = new simple_html_dom();
foreach($dom->find("h1,h2,h3,h4,h5") as $e)
            $e->outertext = "<strong>".$e->innertext."";

我正在用 strong.
替换所有 header 标签如果您也愿意，也可以使用内联 css。

Answer 3

有很多 performance-wise 匹配标题的路径：

<h(\d)[^>]*>([^<]*(<(?!\/h)[^<]*)*)<\/h>

Live demo

* 引擎在 61 个步骤中找到匹配项，同时在 accepted answer, engine needs to take too many steps (1193 steps) 中使用提供的正则表达式来匹配相同的部分。

正确的方法：

虽然正则表达式在大多数时候看起来很方便，但为正确的工作使用正确的工具是一个很好的做法：DOMDocument。

$dom = new domdocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new domxpath($dom);
$headings = $xpath->query("//h1 | //h2 | //h3 | //h4 | //h5 | //h6");
foreach ($headings as $h) {
    $s = $dom->createElement("strong", $h->nodeValue);
    $h->parentNode->replaceChild($s, $h);
}
echo $dom->saveHTML();

PHP live demo

preg_replace 个标题强

preg_replace headings with strong

php

regex

preg-replace

正确的方法：