如何使用 php 从远程 HTML 页面检索特定元素和属性?

How to use php to retrieve the particular element and attribute from a remote HTML page?

如何使用 php 从远程 HTML 页面检索特定元素和属性?

例如,如果要检索的元素和属性的格式为:

<a href="/dir/someid/" class="ccc">

如有任何帮助,我们将不胜感激。

将使用的代码方法:


<?php
   $file = fopen ("http://www.example.com/", "r");
   if (!$file) {
       echo "<p>Unable to open remote file.\n";
       exit;
   }
   while (!feof ($file)) {
       $line = fgets ($file, 1024);
       /* This only works if the title and its tags are on one line */
       if (preg_match ("@\<title\>(.*)\</title\>@i", $line, $out)) {
           $title = $out[1];
           break;
       }
   }
   fclose($file);
   ?>

解决方案:

        $homepage = file_get_contents ("https://www.somedomain.com");
        $doc = new DOMDocument;
        $doc->preserveWhiteSpace = false;
        @$doc->loadHTML($homepage);
        $xpath = new DOMXpath($doc);
        $results = $xpath->query("//div[@class='some-class']");

        foreach($results as $contextNode) {

            $text = $xpath->evaluate("string(./a[1])",$contextNode);
            $href = $xpath->evaluate("string(./a[1]/@href)",$contextNode);

            }