忽略正则表达式中的特定标记 - 否定前瞻

Question

所以，我的 php 代码中有这个场景，其中我有以下字符串

This is an outside Example <p href="https://example.com"> This is a para Example</p><markup class="m"> this is a markup example</markup>

而且我想对这个字符串中的单词 example 进行不区分大小写的搜索，但是

我希望我的正则表达式忽略标记属性中示例的出现（我能够实现）
我想完全忽略以下 <markup ..> any content </markup> 中的搜索

到目前为止我所做的是，

/(example)(?:[^<]*>)/i

这很好用，忽略了 p 标签的 href 中的示例，现在我已经为 <markup>

修改了它

/(example)(?!([^<]*>)|(\<markup[^>]*>[^<]*<\/markup\>))/i

但这不起作用。你可以看到我的作品 - https://regex101.com/r/e2XujN/1

What I want to achieve with this

我将用以下方式替换匹配的 example 单词

假设如果我找到 eXamPle 它将被替换为 <markup>eXamPle</markup>
Example 将替换为 <markup>Example</markup>

等等，

Note: Case of the pattern in the matched string and replace string is same

Answer 1

您可以在 PCRE 中使用谓词 (*SKIP)(*F) 来匹配和跳过由 pattern/string（此处为标记）包围的某些子字符串，如下所示：

(markup).*(*SKIP)(*F)|(example)(?![^<]*>)

解释：

排除的子字符串：第一个捕获组
(markup): 按字面匹配字符标记（不区分大小写）
.* 匹配任何字符（行终止符除外）
</code> 匹配与第一个捕获组相同的文本<br> <code>(*SKIP)超过
(*F) shorthand 对于 (*FAIL)，不匹配

Answer 2

你可以像解决第一个问题一样解决它。检查字符串后面是否没有直接跟结束标记。

正则表达式：

(example)(?![^<]*>)(?![^<]*<\/markup\>)

Demo

Answer 3

答案是使用 DOM，但是使用文本节点并向其中插入 HTML 内容有点棘手。

PHP live demo

$content = <<< 'HTML'
This is an outside Example <p href="https://example.com"> This is a para Example</p>
test <markup class="m"> this is a markup example</markup> another example <p>example</p>
HTML;

// Initialize a DOM object
$dom = new DOMDocument();
// Use an HTML element tag as our HTML container
// @hakre [
@$dom->loadHTML("<div>$content</div>");

$wrapper = $dom->getElementsByTagName('div')->item(0);
// Remove wrapper
$wrapper = $wrapper->parentNode->removeChild($wrapper);
// Remove all nodes of $dom object
while ($dom->firstChild) {
    $dom->removeChild($dom->firstChild);
}
// Append all $wrapper object nodes to $dom
while ($wrapper->firstChild) {
    $dom->appendChild($wrapper->firstChild);
}

$dox = new DOMXPath($dom);
// Query all elements in addition to text nodes
$query = $dox->query('/* | /text()');

// Iterate through all nodes
foreach ($query as $node) {
    // If it's not an HTML element
    if ($node->nodeType != XML_ELEMENT_NODE) {
        // Replace desired word / content
        $newContent = preg_replace('~(example)~i',
            '<markup></markup>',
            $node->wholeText);
        // We can't insert HTML directly into our node
        // so we need to create a document fragment
        $newNode = $dom->createDocumentFragment();
        $newNode->appendXML($newContent);
        // Replace new content with old one
        $node->parentNode->replaceChild($newNode, $node);
    }
}

// Save modifications
echo $dom->saveHTML($dom);

忽略正则表达式中的特定标记 - 否定前瞻

Ignore a specific tag in regex - negitive lookahead

php

regex

regex-negation

regex-group

regex-lookarounds