PHP 中带有 DOMDocument 的正则表达式

Regex with DOMDocument in PHP

考虑以下 PHP 片段:

<?php

$html = <<<DATA
<p>Lorem Ipsum is simply dummy text</p> <p>Lorem Ipsum is <a href="http://www.google.com">simply</a> dummy text</p><a href="http://www.youtube.com/watch?v=DUQi_R4SgWo" target="_blank" rel="noopener">Check out the video here!</a>. <p>Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p> <a href="http://www.youtube.com/watch?v=A_6gNZCkajU" target="_blank" rel="noopener">Video here</a> <p>It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
DATA;

# set up the DOM
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

# set up the xpath
$xpath = new DOMXPath($dom);

# set up the regex
$regex = '~\?v=([^&]+)~';

foreach ($xpath->query("a[contains(@href, 'youtube')]/@href") as $link) {
    preg_match($regex, $link->nodeValue, $matches);
    if ($matches) {
        $id = $matches[1];
        echo "$id\n";
    }
}
?>

这会在 HTML 字符串上设置 DOM,然后借助 xpath 查询和正则表达式获取 YouTube 链接。
该片段产生

DUQi_R4SgWo
A_6gNZCkajU


现在,我想将 foreach 循环替换为:

$regex = '~\?v=([^&]+)~';

$xpath->registerPHPFunctions();
$xpath->registerNamespace("php", "http://php.net/xpath");
$links = $xpath->query("a[php:functionString('preg_match', '$regex', href, '$matches')]/@href");

这会找到相同的链接,但不会将任何内容保存到 $matches - 为什么?

快速扫描underlying engine code:它不支持按引用传递。

要解决这个问题,请使用您自己的包装器:

$xpath->registerNamespace('php', 'http://php.net/xpath');
$xpath->registerPHPFunctions('match');
$links = $xpath->query("a[php:functionString('match', @href)]/@href");

function match($href) {
    $regex = '~\?v=([^&]+)~';
    $rc = preg_match($regex, $href, $matches);
    var_dump($matches[1]); // store this somewhere
    return $rc;
}

See it live on 3v4l.org.

$matches = array();
function mymatch($string){
  $regex = '~\?v=([^&]+)~';
  global $matches;
  preg_match_all($regex, $string, $matches);

}
$xpath->registerPHPFunctions('mymatch');
$xpath->registerNamespace("php", "http://php.net/xpath");
$links = $xpath->query("a[php:functionString('mymatch',@href)]/@href");
print_r($matches);