PHP - DOMDocument scrape div 不删除图像

PHP - DOMDocument scrape divs dont remove images

这是我当前的 php 代码:

$dom = new DOMDocument;
@$dom->loadHTML($file);

$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[@class="test"]');
if ($divs->length > 0) {
    foreach ($divs as $key => $div) {
       print_r($div);
    }
}

在每个 div 中也是我也想输出的图像,但 DOMDocument 正在删除它。

图像在 html 文件中实现:

<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313232.jpg" alt="test" />

我想将 data-src 的值附加到 div 中的文本。

谢谢, 此致

对于每个 div,您可以使用 $div->getElementsByTagName("img") 来获取图像。然后循环图像检查 img 的 alt 属性是否为 test 并获取 data-src 属性:

@$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[@class="test"]');
foreach ($divs as $key => $div) {
    echo $div->textContent . "<br>";
    foreach ($div->getElementsByTagName("img") as $img) {
        if ($img->getAttribute('alt') === 'test') {
            echo $img->getAttribute('data-src') . "<br>";
        }
    }
}

Demo

I want to output the value of data-src additionally to the text in the div.

我不知道你的问题中有任何“text in the div”。
要获取img属性data-srcvalue,可以使用getAttribute('data-src'),即:

$html = <<< L
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313232.jpg" alt="test" />
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313233.jpg" alt="test" />
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313256.jpg" alt="test" />
L;

$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('.//img[contains(@alt, "test")]'); # or query('.//img[contains(@src, "loading.gif")]');
foreach ($divs as $img) {
    print($img->getAttribute('data-src')."\n");
}

输出:

https://test.com/images/images/120/1313131313232.jpg
https://test.com/images/images/120/1313131313233.jpg
...

PHP Demo