删除图像的包装标签

Question

我有一个 CKeditor 在图像周围输出一些标签。到目前为止，我正在使用正则表达式来摆脱那些包装标签。

以下是一些测试字符串：

$example1 = '<p data-entity-type="" data-entity-uuid="" style="text-align: center;"><span><img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620" /><span title="Click and drag to resize">•</span></span></p>';
$example2 = '<p><img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620" /></p>';
$example3 = '<html>
<head></head>
<body>
some text here...
<p><img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620" />
</p>
</body>
</html>';
// Wanted result : <html><head></head><body>some text here...<img alt="julie-bishop.jpg" data-entity-type="" data-entity-uuid="" height="349" src="/sites/default/files/inline-images/julie-bishop.jpg" width="620" /></body></html>

我试过的正则表达式是 /(.*?)\s*(<img[^<]+?)\s*<\/p>(.*)/，这与示例 2 完美配合。

preg_replace("/(.*?)<p>\s*(<img[^<]+?)\s*<\/p>(.*)/", "", $string);

规则是：如果您检测到

将作为其子项之一，则保留并删除

及其其他子项（可以是 span 或其他.. .)

知道如何实现我需要的吗？

Answer 1

您可以使用以下正则表达式：

<p(?:[^>]*|\r\n|\n)>(?:.*|\r\n|\n)(<img(?:[^>]*|\r\n|\n)>)(?:.*|\r\n|\n)<\/p>

这是 demo regex101.com

这里是 eval.in 中的 working demo（您的 PHP 代码）

Answer 2

它没有使用正则表达式，但如果你使用任何 xml 解析器，如 DOM.

，它可以以更易读的方式完成

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - a popular quote by Jamie Zawinski:

您可以使用 http://php.net/manual/en/domdocument.loadhtml.php to load the html fragment. then can use http://php.net/manual/en/domdocument.getelementsbytagname.php 来获取所有 。获得  标签的节点列表后，您可以遍历每个节点。

在每个  节点上使用，然后可以使用 http://php.net/manual/en/domdocument.getelementsbytagname.php to find any <img> tag. if any is found you can use $node->childNodes to get children of each  node. loop through and use http://php.net/manual/en/domnode.removechild.php to remove the children node other than <img> node. once done you can use http://php.net/manual/en/domdocument.savehtml.php 获取处理后的 html。

Answer 3

您应用的方法不好，您应该使用 DOMDocument 而不是 REGEX。这里我们使用 DOMDocument 和 DOMXPath。希望我的解决方案能帮到您，一定能解决您的问题。

<?php
ini_set('display_errors', 1);
$example1 = '<p data-entity-type="" data-entity-uuid="" style="text-align: center;"><span><img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620" /><span title="Click and drag to resize">•</span></span></p>';
$example2 = '<p><img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620" /></p>';
$example3 = '<html>
<head></head>
<body>
some text here...
<p><img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620" />
</p>
</body>
</html>';


$domDocument= new DOMDocument();
$domDocument->loadHTML($example1,LIBXML_HTML_NOIMPLIED);
$domXPath=new DOMXPath($domDocument);

if($domXPath->query("//html")->length)
{
    foreach($domXPath->query("//p") as $pelement)
    {
        if($domXPath->query("//img",$pelement)->length)
        {
            $pelement->parentNode->replaceChild(getReplacement($domXPath),$pelement);
        }
    }
    echo $pelement->ownerDocument->saveHTML();
}
else
{
    echo getReplacement($domXPath,true);
}

function getReplacement($domXPath,$string=false)
{
    global $domDocument;
    $results=$domXPath->query('//p');
    foreach($results as $result)
    {
        if($innerNodes=$domXPath->query("//img",$result->childNodes->item(0)))
        {
            if($string===true)
            {
                return $domDocument->saveHTML($result->childNodes->item(0));
            }
            else 
            {
                return $result->childNodes->item(0);
            }
        }
    }
}

string1 的输出：

<img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620">Ã¢Â€Â¢

string2 的输出：

<img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620">

string3 的输出：

<html> <head></head> <body> some text here... <img alt="image.jpg" data-entity-type="" data-entity-uuid="" height="349" src="image.jpg" width="620"> </body> </html>

Answer 4

因为只涉及 TAGS，特别是相邻标签 <img../>
使用正则表达式可以轻松完成。

问题是所有标签都必须匹配，如果不涉及则跳过
上面的顺序。

必须匹配所有标签的原因是标签可以隐藏在里面
隐藏内容和评论。

但是，php 赋予您 (*SKIP)(*FAIL) 回溯控制动词的力量
这可以匹配，但更重要的是，跳过其他标签
和隐藏的内容，没有以匹配的形式出现在正则表达式中。

并且，当与几个原子团放在一起时，速度很好。

此结果显示 50 次迭代 130K html source = 6.5 MB of html in 2/3 second.

Regex1:   (?><p\s*>\s*(<img\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]?)+\s*/?>)\s*</p\s*>)|(?><(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:\s+(?>"[\S\s]*?"|'[\S\s]*?'|(?:(?!/>)[^>])?)+)?\s*>)[\S\s]*?</\s*(?=>))|(?:/?[\w:]+\s*/?)|(?:[\w:]+\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]?)+\s*/?)|\?[\S\s]*?\?|(?:!(?:(?:DOCTYPE[\S\s]*?)|(?:\[CDATA\[[\S\s]*?\]\])|(?:--[\S\s]*?--)|(?:ATTLIST[\S\s]*?)|(?:ENTITY[\S\s]*?)|(?:ELEMENT[\S\s]*?))))>)(*SKIP)(*FAIL)
Completed iterations:   50  /  50     ( x 1 )
Matches found per iteration:   2
Elapsed Time:    0.68 s,   683.32 ms,   683318 µs

https://regex101.com/r/CCyNZ5/1

查找（字符串）：

删除图像的包装标签

Remove wrapping tags for images

php

regex

preg-replace

ckeditor