PHP - DOMDocument scrape div 不删除图像
PHP - DOMDocument scrape divs dont remove images
这是我当前的 php 代码:
$dom = new DOMDocument;
@$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[@class="test"]');
if ($divs->length > 0) {
foreach ($divs as $key => $div) {
print_r($div);
}
}
在每个 div 中也是我也想输出的图像,但 DOMDocument 正在删除它。
图像在 html 文件中实现:
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313232.jpg" alt="test" />
我想将 data-src 的值附加到 div 中的文本。
谢谢,
此致
对于每个 div,您可以使用 $div->getElementsByTagName("img")
来获取图像。然后循环图像检查 img 的 alt 属性是否为 test
并获取 data-src
属性:
@$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[@class="test"]');
foreach ($divs as $key => $div) {
echo $div->textContent . "<br>";
foreach ($div->getElementsByTagName("img") as $img) {
if ($img->getAttribute('alt') === 'test') {
echo $img->getAttribute('data-src') . "<br>";
}
}
}
I want to output the value of data-src
additionally to the text in
the div
.
我不知道你的问题中有任何“text in the div”。
要获取img
属性data-src
的value
,可以使用getAttribute('data-src')
,即:
$html = <<< L
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313232.jpg" alt="test" />
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313233.jpg" alt="test" />
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313256.jpg" alt="test" />
L;
$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('.//img[contains(@alt, "test")]'); # or query('.//img[contains(@src, "loading.gif")]');
foreach ($divs as $img) {
print($img->getAttribute('data-src')."\n");
}
输出:
https://test.com/images/images/120/1313131313232.jpg
https://test.com/images/images/120/1313131313233.jpg
...
这是我当前的 php 代码:
$dom = new DOMDocument;
@$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[@class="test"]');
if ($divs->length > 0) {
foreach ($divs as $key => $div) {
print_r($div);
}
}
在每个 div 中也是我也想输出的图像,但 DOMDocument 正在删除它。
图像在 html 文件中实现:
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313232.jpg" alt="test" />
我想将 data-src 的值附加到 div 中的文本。
谢谢, 此致
对于每个 div,您可以使用 $div->getElementsByTagName("img")
来获取图像。然后循环图像检查 img 的 alt 属性是否为 test
并获取 data-src
属性:
@$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[@class="test"]');
foreach ($divs as $key => $div) {
echo $div->textContent . "<br>";
foreach ($div->getElementsByTagName("img") as $img) {
if ($img->getAttribute('alt') === 'test') {
echo $img->getAttribute('data-src') . "<br>";
}
}
}
I want to output the value of
data-src
additionally to the text in thediv
.
我不知道你的问题中有任何“text in the div”。
要获取img
属性data-src
的value
,可以使用getAttribute('data-src')
,即:
$html = <<< L
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313232.jpg" alt="test" />
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313233.jpg" alt="test" />
<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313256.jpg" alt="test" />
L;
$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('.//img[contains(@alt, "test")]'); # or query('.//img[contains(@src, "loading.gif")]');
foreach ($divs as $img) {
print($img->getAttribute('data-src')."\n");
}
输出:
https://test.com/images/images/120/1313131313232.jpg
https://test.com/images/images/120/1313131313233.jpg
...