如何使用正则表达式从 HTML 获取数据
How to get data from HTML using regex
我关注HTML
<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>
我想从 <td class="stat stat-last">
=> <div class="statnum">
= 22
.
中获取值
我尝试了以下正则表达式,但没有找到任何匹配项。
/<div\sclass="statnum">^(.)\?<\/div>/ig
我认为如果您为此使用 XML 解析器而不是正则表达式会更好。简单XML 可以为您完成这项工作:http://php.net/manual/en/book.simplexml.php
您可以这样编辑您的图案:
/<div\sclass="statnum">(.*?)<\/div>/ig
/<td class="stat stat-last">.*?<div class="statnum">(\d+)/si
您的匹配项在第一个捕获组中。注意最后使用了 s 选项。使'。'匹配换行符。
这是使用解析器完成此操作的方法。
<?php
$html = '<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$tds = $doc->getElementsByTagName('td');
foreach ($tds as $cell) { //loop through all Cells
if(strpos($cell->getAttribute('class'), 'stat-last')){
$divs = $cell->getElementsByTagName('div');
foreach($divs as $div) { // loop through all divs of the cell
if($div->getAttribute('class') == 'statnum'){
echo $div->nodeValue;
}
}
}
}
输出:
22
...或使用 xpath...
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$statnums = $xpath->query("//td[@class='stat stat-last']/a/div[@class='statnum']");
foreach($statnums as $statnum) {
echo $statnum->nodeValue;
}
输出:
22
或者如果您真的想对其进行正则表达式...
<?php
$html = '<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>';
preg_match('~td class=".*?stat-last">.*?<div class="statnum">(.*?)<~s', $html, $num);
echo $num[1];
输出:
22
正则表达式演示:https://regex101.com/r/kM6kI2/1
我关注HTML
<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>
我想从 <td class="stat stat-last">
=> <div class="statnum">
= 22
.
我尝试了以下正则表达式,但没有找到任何匹配项。
/<div\sclass="statnum">^(.)\?<\/div>/ig
我认为如果您为此使用 XML 解析器而不是正则表达式会更好。简单XML 可以为您完成这项工作:http://php.net/manual/en/book.simplexml.php
您可以这样编辑您的图案:
/<div\sclass="statnum">(.*?)<\/div>/ig
/<td class="stat stat-last">.*?<div class="statnum">(\d+)/si
您的匹配项在第一个捕获组中。注意最后使用了 s 选项。使'。'匹配换行符。
这是使用解析器完成此操作的方法。
<?php
$html = '<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$tds = $doc->getElementsByTagName('td');
foreach ($tds as $cell) { //loop through all Cells
if(strpos($cell->getAttribute('class'), 'stat-last')){
$divs = $cell->getElementsByTagName('div');
foreach($divs as $div) { // loop through all divs of the cell
if($div->getAttribute('class') == 'statnum'){
echo $div->nodeValue;
}
}
}
}
输出:
22
...或使用 xpath...
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$statnums = $xpath->query("//td[@class='stat stat-last']/a/div[@class='statnum']");
foreach($statnums as $statnum) {
echo $statnum->nodeValue;
}
输出:
22
或者如果您真的想对其进行正则表达式...
<?php
$html = '<table class="profile-stats">
<tr>
<td class="stat">
<div class="statnum">8</div>
<div class="statlabel"> Tweets </div>
</td>
<td class="stat">
<a href="/THEDJMHA/following">
<div class="statnum">13</div>
<div class="statlabel"> Following </div>
</a>
</td>
<td class="stat stat-last">
<a href="/THEDJMHA/followers">
<div class="statnum">22</div>
<div class="statlabel"> Followers </div>
</a>
</td>
</tr>
</table>';
preg_match('~td class=".*?stat-last">.*?<div class="statnum">(.*?)<~s', $html, $num);
echo $num[1];
输出:
22
正则表达式演示:https://regex101.com/r/kM6kI2/1