xpath 匹配错误的节点
xpath matching wrong node
xpath
//*[h1]
在 python 和 Firebug 上尝试显示不同的结果。我的代码:
import requests
from lxml import html
url = "http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/"
resp = requests.get(url)
page = html.fromstring(resp.content)
node = page.xpath("//*[h1]")
print node
#[<Element center at 0x7fb42143c7e0>]
但是 Firebug 匹配到我想要的 <header>
标签。
为什么会这样?我如何使我的 python 代码也与 <header>
匹配?
您缺少 User-Agent header,因此响应内容返回 403 Forbidden,将其添加到请求中并按预期工作:
In [9]: resp = requests.get(url, headers={"User-Agent": "Test Agent"})
In [10]: page = html.fromstring(resp.content)
In [11]: node = page.xpath("//*[h1]")
In [12]: print node
[<Element header at 0x104ff15d0>]
xpath
//*[h1]
在 python 和 Firebug 上尝试显示不同的结果。我的代码:
import requests
from lxml import html
url = "http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/"
resp = requests.get(url)
page = html.fromstring(resp.content)
node = page.xpath("//*[h1]")
print node
#[<Element center at 0x7fb42143c7e0>]
但是 Firebug 匹配到我想要的 <header>
标签。
为什么会这样?我如何使我的 python 代码也与 <header>
匹配?
您缺少 User-Agent header,因此响应内容返回 403 Forbidden,将其添加到请求中并按预期工作:
In [9]: resp = requests.get(url, headers={"User-Agent": "Test Agent"})
In [10]: page = html.fromstring(resp.content)
In [11]: node = page.xpath("//*[h1]")
In [12]: print node
[<Element header at 0x104ff15d0>]