使用在 <br> 标签后没有 div 的 simplehtmldom 抓取数据

Question

如何才能到达第 3 行并只提取时间？

<BR>
<BR>UTC=2016-10-12  15:03:58.042 Wed
<BR> LT=2016-10-12  17:03:58.042 Wed (Summer)
<BR>Country/Timezone=+1d (Berlin,Brussels,Paris) ,UTC=60 min.
<BR>Summertime from 25 Mar 01:00, Wintertime from 25 Oct 01:00 (UTC)

所以期望的输出是：17:03:58.042

我正在尝试使用简单 html dom

这会显示所有文本。我试图找到正确的 selector 但我想要的数据不在 div 之间。只是
谁知道如何 select 正确的行？

<?php
// example of how to use basic selector to retrieve HTML contents
include('simple_html_dom.php');

// get DOM from URL or file
$html = file_get_html('http://10.20.83.1/status.htm');

// extract text from HTML
echo $html->plaintext;
?>

Answer 1

1.正在提取文本。

也许使用：

// Find all text blocks $es = $html->find('text');

来自 http://simplehtmldom.sourceforge.net/manual.htm#section_quickstart

注意：如果想要的文本块总是第二个，你可以像这样使用它：

// Find all text blocks $es = $html->find('text', 2);

2。按格式验证或解释日期。

我曾经写过一个 php 小函数来根据格式猜测一些日期时间值。看到这个：http://pastebin.com/DrYwdU2D

如果您愿意，可以使用正则表达式来做同样的事情： PHP Regex to check date is in YYYY-MM-DD format

希望对您有所帮助。

使用在 <br> 标签后没有 div 的 simplehtmldom 抓取数据

scrape data with simplehtmldom without div's after a <br> tag

html

simple-html-dom

web-scraping