如何使用简单的 html dom php 查找特定数据

How to find specific data using simple html dom php

当我抓取 table 时,table tr 和 td 值正在改变。下面是原文 table.

<table class="scoretable">
<tbody>
<tr><td class="jdhead">Name</td><td class="fullhead">John</td></tr>
<tr><td class="jdhead">Age</td><td class="fullhead">30</td></tr>
<tr><td class="jdhead">Phone</td><td class="fullhead">91234988788</td></tr>
<tr><td class="jdhead">Location</td><td class="fullhead">Madrid</td></tr>
<tr><td class="jdhead">Country</td><td class="fullhead">Spain</td></tr>
<tr><td class="jdhead">Role</td><td class="fullhead">Manager</td></tr>
</tbody>
</table>

<table class="scoretable">
<tbody>
<tr><td class="jdhead">Name</td><td class="fullhead">John</td></tr>
<tr><td class="jdhead">Age</td><td class="fullhead">30</td></tr>
<tr><td class="jdhead">Phone</td><td class="fullhead">91234988788</td></tr>
<tr><td class="jdhead">Role</td><td class="fullhead">Manager</td></tr>
</tbody>
</table>

以上两个table来自不同的页面。我需要抓取名称、Phone 和角色。

$url = "http://name.com/listings";
$html = file_get_html( $url );

$posts1 = $html->find('td[class=fullhead]',1);

foreach ( $posts1 as $post1 ) {
    $poster1 = $post1->outertext;
    echo $poster1;
    }

我会尝试 preg_match 来自 HTML 的所需值,如下所示:

<?php
$url = 'http://name.com/listings';
$html = file_get_contents($url);

if (preg_match('~<tr><td class="jdhead">Name</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you name   
}

if (preg_match('~<tr><td class="jdhead">Phone</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you phone  
}

if (preg_match('~<tr><td class="jdhead">Role</td><td class="fullhead">([^<]*)</td></tr>~', $html, $matches)) {
    echo $matches[1]; // here is you role   
}

更新(见下方评论):

<?php
$url = 'http://jobsearch.naukri.com/job-listings-010915006292';
$html = file_get_contents($url);

if (preg_match('~<TR VALIGN="top"> <TD CLASS="jdHead">Job Posted </TD> <TD VALIGN="top" CLASS="detailJob">([^<]*)</TD> </TR>~', $html, $matches)) {
    echo 'Job Posted: ' . $matches[1] . '<br><br>';
}


if (preg_match('~<TR VALIGN="top"> <TD CLASS="jdHead">Job Description</TD> <TD VALIGN="top" CLASS="detailJob">(.*?)</TD> </TR>~', $html, $matches)) {
    echo 'Job Description: ' . $matches[1] . '<br><br>';
}

我有适合您的解决方案示例:

<?php
// load
$doc = new DOMDocument();
$doc->loadHTMLFile("tabledata.html");

// required nodes
$required_data = ['Name', 'Phone', 'Role'];

$tbody_elements = $doc->getElementsByTagName('tbody');

// xpath object
$xpath = new DOMXPath($doc);

// array for final data
$finaldata = [];
// each tr is one user
foreach($tbody_elements as $key => $tbody)
{
    // iterate though the required data
    foreach($required_data as $data)
    {
        $return = $xpath->query("tr[td[text()='$data']]", $tbody);

        foreach($return as $node)
        {
            $finaldata[$key][$data] = $node->textContent;
        }
    }
}

输出:

array(2) {
  [0]=>
  array(3) {
    ["Name"]=>
    string(8) "NameJohn"
    ["Phone"]=>
    string(16) "Phone91234988788"
    ["Role"]=>
    string(11) "RoleManager"
  }
  [1]=>
  array(3) {
    ["Name"]=>
    string(8) "NameJohn"
    ["Phone"]=>
    string(16) "Phone91234988788"
    ["Role"]=>
    string(11) "RoleManager"
  }
}