如何使用 DomDocument 从给定 html 中获取 href、Image src、title

How can I get a href,Image src,title from given html using DomDocument

给定 Html -

  <div id="testid">
  <h1>Test Title</h1>
      <ul class="clearfix">
        <li class="anker" id="artists-A"></li>
        <li class="first">
            <a href="www.test1.html" title="Test1">
            <span>
            <img src="https://www.test1.de/img/test1.jpg" alt="Test1" />
            <span>Test1</span>
            </span>
            </a>
        </li>
        <li>
            <a href="www.test2.html" title="Test2">
            <span>
            <img src="https://www.test2.de/img/test2.jpg" alt="Test2" />
            <span>Test2</span>
            </span>
            </a>
        </li>
        <li class="first">
            <a href="www.test3.html" title="Test3">
            <span>
            <img src="https://www.test1.de/img/test3.jpg" alt="Test3" />
            <span>Test3</span>
            </span>
            </a>
        </li>
      </ul> 
</div>

需要获取 href 值、img src 和 span 即 Title 。 我正在使用 domDocument 对此进行解析,但没有得到准确的结果。

代码:

$doc = new DomDocument; 
$doc->validateOnParse = true; 
$doc->loadHtml(file_get_contents($url)); 
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//[@id="testid"]/ul/li');

这里我们使用DOMDocument。现在我正在收集ahrefimgsrc,你可以添加更多你想要的标签。

Try this code snippet here

$domDocument = new DOMDocument();
$domDocument->loadHTML($string);

$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query("//div[@id='testid']");//querying div with id="testid"
$results = $domXPath->query("//a|//img",$results->item(0));//querying resultant div for a and img
$data=array();
foreach($results as $result){
    if($result->tagName=="a")//checking for anchor tags
    {
        $data["a"][]=array(
            "href"=>$result->getAttribute("href"),
            "title"=>$result->getAttribute("title")
        );
    }
    elseif($result->tagName=="img")//checking for image tags
    {
        $data["img"][]=$result->getAttribute("src");
    }
}
print_r($data);

我建议你使用 SimpleHtmlDom 库。

<?php 

 require_once "SimpleHtmlDom.php";
 
 // put in file contentToParse.html your html code
 $htmlToParse = file_get_contents("contentToParse.html");
 
 $htmlObject = str_get_html($htmlToParse);
 
 $resultObject = array();
 
 
 foreach($htmlObject->find("#testid ul li a") as $singleLink)
 {
  var_dump($singleLink->href);
 }
        foreach($htmlObject->find("#testid ul li img") as $singleImage)
 {
  var_dump($singleImage->src);
 }
 foreach($htmlObject->find("#testid ul li span span") as $singleSpan)
 {
  var_dump($singleSpan->innertext);
 }
?>