基于日期抓取

Question

我正在尝试从一个网站上抓取数据，该网站的标签中似乎没有太多分类。但是我仍然想知道是否可以使用 xpath 抓取今天的标题。

以便它只检索 09/4 - 2015 年的标题？

Answer 1

由于日期是唯一的 10/4 - 2015，您可以使用 xpath 的 contents() 定位 b 标记节点，请参阅 html here:

//b[contains(., '10/4 - 2015')]

然后基于这个节点你去它的父节点和兄弟节点，smth。像这样（未测试）：

//b[contains(., '10/4 - 25')]/parent::div/siblings::div

更新

由于当前日期项位于底部，因此根据 html 所有以下兄弟节点都属于此数据 (google xpath sibling after)

//b[contains(., '10/4 - 25')]/parent::div/following-sibling::div[@class='newsItem']

查看测试 here. If you want to fetch divs inbetween, then explore this