PHP dom 正在解析

Question

我正在尝试获取以下 table 的值。我分别尝试了 curl/regex（我知道不推荐这样做）和 DOM，但无法正确获取值。

页面中有多行，所以我需要使用 foreach。我需要下面结构的精确匹配。

<tr>
    <td width="75" style="NS">
        <img src="NS" width="64" alt="INEEDTHISVALUE">
    </td>
    <td style="NS">
        <a href="NS">NS</a>
    </td>
    <td style="NS">INEEDTHISVALUETOO</td>
</tr>

NS = 非静态值。它们针对每个 td 和 a 而改变，因为它是彩色的（内联 css）table。它们可能包含特殊字符，例如； / 或 numbers/alphabetical 个字符。

我正在使用 simple_html_dom class，可在此处找到：http://htmlparsing.com/php.html

我正在使用下面的代码获取所有 td，但我需要更具体的输出（我包括上面的 table 行）

到目前为止我已经尝试过：

$html = file_get_html("URL");
foreach($html->find('td') as $td) {
    echo $td."<br>";
}

正则表达式和卷曲

$site = "URL";
$ch = curl_init();
$hc = "YahooSeeker-Testing/v3.9 (compatible; Mozilla 4.0; MSIE 5.5; Yahoo! Search - Web Search)";
curl_setopt($ch, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt($ch, CURLOPT_URL, $site);
curl_setopt($ch, CURLOPT_USERAGENT, $hc);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$site = curl_exec($ch);
curl_close($ch);
preg_match_all('@<tr><td width="75" style="(.*?)"><img src="/folder/link/(.*?)" width="64" alt="(.*?)"></td><td style="(.*?)"><a href="/folder2/link2/(.*?)">(.*?)</a></td><td style="(.*?)">(.*?)</td></tr>@', $site, $arr);
var_dump($arr); // returns empty array, WHY?

Answer 1

你可以在没有图书馆的情况下这样做：

$results = array();
$doc = new DOMDocument();
$doc->loadHTML($site);
$xpath = new DOMXPath($doc);

foreach ($xpath->query('//tr') as $tr) {
    $results[] = array(
        'img_alt' => $xpath->query('td[1]/img', $tr)->item(0)->getAttribute('alt'),
        'td_text' => $xpath->query('td[last()]', $tr)->item(0)->nodeValue
    );
}

print_r($results);

它会给你：

Array
(
    [0] => Array
        (
            [img_alt] => INEEDTHISVALUE 1
            [td_text] => INEEDTHISVALUETOO 1
        )

    [1] => Array
        (
            [img_alt] => INEEDTHISVALUE 2
            [td_text] => INEEDTHISVALUETOO 2
        )

)

PHP dom 正在解析

PHP dom parsing

php

parsing

dom

html-parsing