Python Xpath: lxml.etree.XPathEvalError: Invalid predicate
Python Xpath: lxml.etree.XPathEvalError: Invalid predicate
我正在尝试学习如何抓取网页,但在我使用以下代码的教程中抛出此错误:
lxml.etree.XPathEvalError: Invalid predicate
我要查询的网站是(不要评判我,它是培训视频中使用的网站:/):https://itunes.apple.com/us/app/candy-crush-saga/id553834731
导致错误的xpath字符串在这里:
links = tree.xpath('//div[@class="center-stack"//*/a[@class="name"]/@href')
我正在使用 LXML 并请求库。
如果您需要任何其他信息,我很乐意提供!
print(tree.xpath('//div[@class="center-stack"]//*/a[@class="name"]/@href'))
您在 "center-stack"
之后错过了结束语 ]
。
您也可以只从 div[@class="content"]
中提取 a[@class="name"]
标签
tree.xpath('//div[@class="content"]//a[@class="name"]/@href')
两者都会为您提供所需的 hrefs:
In [19]: import requests
In [20]: from lxml.html import fromstring
In [21]: r = requests.get("https://itunes.apple.com/us/app/candy-crush-saga/id553834731")
In [22]: tree = fromstring(r.content)
In [23]: a = tree.xpath('//div[@class="content"]//a[@class="name"]/@href')
In [24]: b = tree.xpath('//div[@class="center-stack"]//*/a[@class="name"]/@href')
In [25]: print(a == b)
True
In [26]: print(a)
['https://itunes.apple.com/us/app/word-search-puzzles/id609067187?mt=8', 'https://itunes.apple.com/us/app/cookie-jam/id727296976?mt=8', 'https://itunes.apple.com/us/app/jewel-mania/id561326449?mt=8', 'https://itunes.apple.com/us/app/jelly-splash/id645949180?mt=8', 'https://itunes.apple.com/us/app/bubble-island/id531354582?mt=8']
In [27]: print(b)
['https://itunes.apple.com/us/app/word-search-puzzles/id609067187?mt=8', 'https://itunes.apple.com/us/app/cookie-jam/id727296976?mt=8', 'https://itunes.apple.com/us/app/jewel-mania/id561326449?mt=8', 'https://itunes.apple.com/us/app/jelly-splash/id645949180?mt=8', 'https://itunes.apple.com/us/app/bubble-island/id531354582?mt=8']
我正在尝试学习如何抓取网页,但在我使用以下代码的教程中抛出此错误:
lxml.etree.XPathEvalError: Invalid predicate
我要查询的网站是(不要评判我,它是培训视频中使用的网站:/):https://itunes.apple.com/us/app/candy-crush-saga/id553834731
导致错误的xpath字符串在这里:
links = tree.xpath('//div[@class="center-stack"//*/a[@class="name"]/@href')
我正在使用 LXML 并请求库。
如果您需要任何其他信息,我很乐意提供!
print(tree.xpath('//div[@class="center-stack"]//*/a[@class="name"]/@href'))
您在 "center-stack"
之后错过了结束语 ]
。
您也可以只从 div[@class="content"]
a[@class="name"]
标签
tree.xpath('//div[@class="content"]//a[@class="name"]/@href')
两者都会为您提供所需的 hrefs:
In [19]: import requests
In [20]: from lxml.html import fromstring
In [21]: r = requests.get("https://itunes.apple.com/us/app/candy-crush-saga/id553834731")
In [22]: tree = fromstring(r.content)
In [23]: a = tree.xpath('//div[@class="content"]//a[@class="name"]/@href')
In [24]: b = tree.xpath('//div[@class="center-stack"]//*/a[@class="name"]/@href')
In [25]: print(a == b)
True
In [26]: print(a)
['https://itunes.apple.com/us/app/word-search-puzzles/id609067187?mt=8', 'https://itunes.apple.com/us/app/cookie-jam/id727296976?mt=8', 'https://itunes.apple.com/us/app/jewel-mania/id561326449?mt=8', 'https://itunes.apple.com/us/app/jelly-splash/id645949180?mt=8', 'https://itunes.apple.com/us/app/bubble-island/id531354582?mt=8']
In [27]: print(b)
['https://itunes.apple.com/us/app/word-search-puzzles/id609067187?mt=8', 'https://itunes.apple.com/us/app/cookie-jam/id727296976?mt=8', 'https://itunes.apple.com/us/app/jewel-mania/id561326449?mt=8', 'https://itunes.apple.com/us/app/jelly-splash/id645949180?mt=8', 'https://itunes.apple.com/us/app/bubble-island/id531354582?mt=8']