preg_replace 所有“_”仅在 url 中有空格

Question

我有一个包含一些数据的 html 文件，包括一些 urls。

只有论文 url，我想用 space 替换 _ 字符（通过 php 文件）。

所以 url 像这样：

</p><p><a rel="nofollow" class="external text" href="http://10.20.0.30:1234/index.php/this_is_an_example.html">How_to_sample.</a>

会变成

</p><p><a rel="nofollow" class="external text" href="http://10.20.0.30:1234/index.php/this is an example.html">How_to_sample.</a>

这不会影响不在 url 上的 _。

我认为 preg_replace 可以做到这一点，但我不知道如何进行。

以下代码不正确，因为它替换了每个 _ 而不仅仅是 url.

中的一个

$content2 = preg_replace('/[_]/', ' ', $content);

谢谢。

编辑：

感谢 preg_replace_callback 建议，这正是我要找的。

    // search pattern
    $pattern = '/href="http:\/\/10.20.0.30:1234\/index.php\/(.*?).html">/s';

    // the function call
    $content2 = preg_replace_callback($pattern, 'callback', $content);

    // the callback function
    function callback ($m) {
        print_r($m);
        $url = str_replace("_", " ", $m[1]);
        return 'href="http://10.20.0.30:1234/index.php/'.$url.'.html">';
    }

Answer 1

更老更聪明：不要使用正则表达式 - 这不是必需的，而且它可能容易不稳定，因为正则表达式不是 DOM-aware。使用 HTML 解析器隔离 <a> 标记，然后隔离 href 属性，然后进行简单的 str_replace() 调用。

代码：(Demo)

$html = <<<HTML
<p><a rel="nofollow" class="external text" href="http://10.20.0.30:1234/index.php/this_is_an_example.html">How_to_sample.</a></p>
HTML;

$dom = new DOMDocument; 
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach($dom->getElementsByTagName('a') as $a) {
    $a->setAttribute('href', str_replace('_', '%20', $a->getAttribute('href')));
}
echo $dom->saveHTML();

输出：

<p><a rel="nofollow" class="external text" href="http://10.20.0.30:1234/index.php/this%20is%20an%20example.html">How_to_sample.</a></p>

A url 不应包含任何空格，空格应编码为 %20。 - Is a URL allowed to contain a space?

原回答：

如果您愿意接受一些正则表达式欺骗，您可以单独使用 preg_replace() 来完成您的任务。

代码：(Demo)

$input = '</p><p><a rel="nofollow" class="external text" href="http://10.20.0.30:1234/index.php/this_is_an_example.html">How_to_sample.</a>';

$pattern = '~(?:\G|\Qhttp://10.20.0.30:1234/index.php\E[^_]+)\K_([^_.]*)~';

echo preg_replace($pattern, " ", $input);

输出：

</p><p><a rel="nofollow" class="external text" href="http://10.20.0.30:1234/index.php/this is an example.html">How_to_sample.</a>

\G 是 "continue" 元字符。它允许您在 url.

的预期部分之后进行多个连续匹配

\Q..\E 表示“按字面意思处理两点之间的所有字符——因此无需转义。

\K 表示 "restart the fullstring match from this point".

Pattern Demo

由于您正在构建 url，我认为您应该替换为 %20。

我想我的模式应该在 \G 之后拒绝字符串的开头以获得最佳实践...

$pattern = '~(?:\G(?!^)|\Qhttp://10.20.0.30:1234/index.php\E[^_]+)\K_([^_.]*)~';

preg_replace 所有“_”仅在 url 中有空格

preg_replace all "_" by spaces only in url

php

regex

url

preg-replace