为什么我没有从网站取回任何数据?
Why am I not getting any data back from website?
所以我对整个网络抓取都是全新的。我一直在从事一个项目,该项目需要我从 here 获得当天的消息。我已经成功地抓住了这个词,现在我只需要得到定义,但是当我这样做时,我得到了这个结果:
Avuncular (Correct word of the day)
Definition:
[]
这是我的代码:
from lxml import html
import requests
page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = html.fromstring(page.content)
word = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[1]/div[2]/div[1]/div/h1/text()')
WOTD = str(word)
WOTD = WOTD[2:]
WOTD = WOTD[:-2]
print(WOTD.capitalize())
print("Definition:")
wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[2]/div[1]/div/div[1]/p[1]/text()')
print(wordDef)
[] 应该是第一个定义,但由于某些原因不起作用。
如有任何帮助,我们将不胜感激。
您的 xpath 略有偏差。这是正确的:
wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[3]/div[1]/div/div[1]/p[1]/text()')
注意 div[3] 在 main/article 而不是 div[2] 之后。现在 运行 你应该得到:
Avuncular
Definition:
[' suggestive of an uncle especially in kindliness or geniality']
如果您想避免在 xpath 中对索引进行硬编码,以下是您当前尝试的替代方法:
import requests
from lxml.html import fromstring
page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = fromstring(page.text)
word = tree.xpath("//*[@class='word-header']//h1")[0].text
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p/strong")[0].tail.strip()
print(f'{word}\n{wordDef}')
如果 wordDef
无法获取完整部分,请尝试替换为以下部分:
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p")[0].text_content()
输出:
avuncular
suggestive of an uncle especially in kindliness or geniality
所以我对整个网络抓取都是全新的。我一直在从事一个项目,该项目需要我从 here 获得当天的消息。我已经成功地抓住了这个词,现在我只需要得到定义,但是当我这样做时,我得到了这个结果:
Avuncular (Correct word of the day)
Definition:
[]
这是我的代码:
from lxml import html
import requests
page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = html.fromstring(page.content)
word = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[1]/div[2]/div[1]/div/h1/text()')
WOTD = str(word)
WOTD = WOTD[2:]
WOTD = WOTD[:-2]
print(WOTD.capitalize())
print("Definition:")
wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[2]/div[1]/div/div[1]/p[1]/text()')
print(wordDef)
[] 应该是第一个定义,但由于某些原因不起作用。
如有任何帮助,我们将不胜感激。
您的 xpath 略有偏差。这是正确的:
wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[3]/div[1]/div/div[1]/p[1]/text()')
注意 div[3] 在 main/article 而不是 div[2] 之后。现在 运行 你应该得到:
Avuncular
Definition:
[' suggestive of an uncle especially in kindliness or geniality']
如果您想避免在 xpath 中对索引进行硬编码,以下是您当前尝试的替代方法:
import requests
from lxml.html import fromstring
page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = fromstring(page.text)
word = tree.xpath("//*[@class='word-header']//h1")[0].text
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p/strong")[0].tail.strip()
print(f'{word}\n{wordDef}')
如果 wordDef
无法获取完整部分,请尝试替换为以下部分:
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p")[0].text_content()
输出:
avuncular
suggestive of an uncle especially in kindliness or geniality