这个 xPath 没有给出任何结果，有什么原因吗？

Question

import requests
from lxml import html

page = requests.get(url="http://www.cia.gov/library/publications/the-world-factbook/geos/ch.html")
tree = html.fromstring(page.content)

bordering = tree.xpath('//*[@id="wfb_data"]/table/tr[4]/td/ul[3]/li[4]/div[17]/span[2]/text()')
print bordering

我使用 chrome 开发人员模式检索了 xPath，但它仍然给我一个空的 "bordering" 变量。我不知道可能出了什么问题。

Answer 1

首先，你需要使用https而不是http:

https://www.cia.gov/library/publications/the-world-factbook/geos/ch.html

此外，还有一种获取边界数据的更简单方法 - 找到包含 border countries 文本的 span 并获取 next sibling's 文本：

bordering = tree.xpath('//*[@id="wfb_data"]//span[starts-with(., "border countries")]/following-sibling::span')[0]
print(bordering.text_content())

打印：

Afghanistan 91 km, Bhutan 477 km, Burma 2,129 km, India 2,659 km, Kazakhstan 1,765 km, North Korea 1,352 km, Kyrgyzstan 1,063 km, Laos 475 km, Mongolia 4,630 km, Nepal 1,389 km, Pakistan 438 km, Russia (northeast) 4,133 km, Russia (northwest) 46 km, Tajikistan 477 km, Vietnam 1,297 km

Answer 2

请在请求中使用 User-Agent 检查。

headers ={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0'}
    page = requests.get(url , headers=headers,timeout=5,  verify=False)

如果可行，请告诉我。

谢谢。

这个 xPath 没有给出任何结果，有什么原因吗？

This xPath is giving no results, any reason why?

python

xpath

web-scraping

python-requests

lxml.html