Xpath

Question

我正在尝试循环 Scrapy 中的 Xpath，它看起来像这样：

for entry in response.xpath('normalize-space(//div[@id="Content"]//div[@id="programDetails"]//div[@id="selfReportedProgramDetails"]//div[@id="hoursOfOperation"]//span[@class="hoursItem"]//span[@class="times"]/text())'):

   print(entry.get())
   print(len(response.xpath('normalize-space(//div[@id="Content"]//div[@id="programDetails"]//div[@id="selfReportedProgramDetails"]//div[@id="hoursOfOperation"]//span[@class="hoursItem"]//span[@class="times"]/text())')))

结果是这样的

9:00 AM to 12:00 PM

1

奇怪的是，我在浏览器检查器工具中显示了 7 child 秒，每个工作日一个 child。

为什么我只得到一个结果？我想提取所有工作日。我不明白我的错误，也许你会给我带来正确方法的提示。

干杯！

//提示后，我使用下面的代码：

for entry in response.xpath('//div[@id="Content"]//div[@id="programDetails"]//div[@id="selfReportedProgramDetails"]//div[@id="hoursOfOperation"]//span[@class="hoursItem"]'):
   print(entry.xpath('normalize-space(//span[@class="times"])').get())

现在我得到了 7 个结果，但始终是第一个 9:00 AM to 12:00 PM。

Answer 1

这个 XPath：

'normalize-space(//div[@id="Content"]//div[@id="programDetails"]//div[@id="selfReportedProgramDetails"]//div[@id="hoursOfOperation"]//span[@class="hoursItem"]//span[@class="times"]/text())'):

因为 normalize-space() 函数全白 space 折叠，所以只会给出一个结果。

因此，要获取这些跨度的实际文本节点，请移除 XPath 周围的规范化-space。

第二个XPath以双斜杠开头，意思是从根开始搜索所有节点。要从当前上下文搜索，请使用 .

有关 // 与 .// 的更多信息，请参阅

Xpath - 我什至只得到 1 个元素 - 检查器工具显示 7

Xpath - I only get 1 element back even - the inspector tools shows 7

python

scrapy