tree.xpath() returns 使用 lxml 库进行 Webscraping 中的空列表

Question

当你去： https://www.youtube.com/feed/trending

3个按钮：出现音乐游戏电影

我想要 select 音乐元素的 <a> 标签。所以我可以从中提取 href 值。我使用了下面的代码，但它一直给我一个空列表。


from urllib.request import urlopen
from lxml import etree

url =  "https://www.youtube.com/feed/trending"

response = urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
print(tree.xpath('//*[@id="contents"]/ytd-channel-list-sub-menu-avatar-renderer[1]/a'))

Answer 1

如果 requests 不起作用，您可以使用 selenium。我在最后使用 selenium 进行了尝试，它运行得非常完美。以下是您可以参考的代码。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import *
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC


URL = "https://www.youtube.com/feed/trending"

chrome_options = Options()
driver = webdriver.Chrome("./chromedriver/chromedriver.exe", options=chrome_options)#download chrome driver and add path here.
driver.maximize_window()

driver.get(URL)

wait1 = WebDriverWait(driver, 200)
wait1.until(EC.presence_of_element_located((By.XPATH, '//*[@id="img"]')))
print('-' * 100)
print(driver.find_element_by_xpath('//*[@id="contents"]/ytd-channel-list-sub-menu-avatar-renderer[1]/a').get_attribute('href'))
print('-' * 100)

tree.xpath() returns 使用 lxml 库进行 Webscraping 中的空列表

tree.xpath() returns empty list in Webscraping using lxml library

python

youtube

lxml

beautifulsoup

web-scraping