tree.xpath() returns 使用 lxml 库进行 Webscraping 中的空列表
tree.xpath() returns empty list in Webscraping using lxml library
当你去:
https://www.youtube.com/feed/trending
3个按钮:出现音乐游戏电影
我想要 select 音乐元素的 <a>
标签。所以我可以从中提取 href 值。我使用了下面的代码,但它一直给我一个空列表。
from urllib.request import urlopen
from lxml import etree
url = "https://www.youtube.com/feed/trending"
response = urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
print(tree.xpath('//*[@id="contents"]/ytd-channel-list-sub-menu-avatar-renderer[1]/a'))
如果 requests 不起作用,您可以使用 selenium。我在最后使用 selenium 进行了尝试,它运行得非常完美。以下是您可以参考的代码。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import *
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
URL = "https://www.youtube.com/feed/trending"
chrome_options = Options()
driver = webdriver.Chrome("./chromedriver/chromedriver.exe", options=chrome_options)#download chrome driver and add path here.
driver.maximize_window()
driver.get(URL)
wait1 = WebDriverWait(driver, 200)
wait1.until(EC.presence_of_element_located((By.XPATH, '//*[@id="img"]')))
print('-' * 100)
print(driver.find_element_by_xpath('//*[@id="contents"]/ytd-channel-list-sub-menu-avatar-renderer[1]/a').get_attribute('href'))
print('-' * 100)
当你去: https://www.youtube.com/feed/trending
3个按钮:出现音乐游戏电影
我想要 select 音乐元素的 <a>
标签。所以我可以从中提取 href 值。我使用了下面的代码,但它一直给我一个空列表。
from urllib.request import urlopen
from lxml import etree
url = "https://www.youtube.com/feed/trending"
response = urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
print(tree.xpath('//*[@id="contents"]/ytd-channel-list-sub-menu-avatar-renderer[1]/a'))
如果 requests 不起作用,您可以使用 selenium。我在最后使用 selenium 进行了尝试,它运行得非常完美。以下是您可以参考的代码。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import *
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
URL = "https://www.youtube.com/feed/trending"
chrome_options = Options()
driver = webdriver.Chrome("./chromedriver/chromedriver.exe", options=chrome_options)#download chrome driver and add path here.
driver.maximize_window()
driver.get(URL)
wait1 = WebDriverWait(driver, 200)
wait1.until(EC.presence_of_element_located((By.XPATH, '//*[@id="img"]')))
print('-' * 100)
print(driver.find_element_by_xpath('//*[@id="contents"]/ytd-channel-list-sub-menu-avatar-renderer[1]/a').get_attribute('href'))
print('-' * 100)