如何通过selenium和python找到twitch上视频的href属性?

How to find the href attribute of the videos on twitch through selenium and python?

我正在尝试查找特定用户的所有视频的 twitch 视频 ID。因此,例如在此页面上 https://www.twitch.tv/dyrus/videos/all

所以我们在这里链接了所有视频,但它并不像抓取 html 并找到链接那么简单,因为它们似乎是动态生成的。

所以我听说了 selenium 并做了这样的事情:

from selenium import webdriver

# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver') 
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_element = driver.find_elements_by_xpath("//*[@href]")


for link in link_element:
    print(link.get_attribute('href'))

driver.close()

这个 returns 我在页面上有一堆链接但不是视频,它们在说谎 "deeper" 我想,有什么输入吗?

提前致谢

使用定位器,您将返回页面上包含 href 属性的每个元素。你可以比这更具体一点,得到你正在寻找的东西。切换到 CSS 选择器...

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC    

# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver') 
driver.get('https://www.twitch.tv/dyrus/videos/all')
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-a-target='preview-card-image-link']")))

for link in links:
    print(link.get_attribute('href'))

driver.close()

从页面打印 40 个链接。

我仍然建议进行如下一些更改:

  • 始终以 最大化 模式打开 Web 浏览器,以便 all/majority 个所需元素位于 [=16] =].
  • 如果您在 Windows OS 您需要在 的末尾添加扩展名 .exe WebDriver 变体名称,例如chromedriver.exe
  • 当您识别元素时,请始终尝试在您的 定位器策略.
  • 中包含 class 属性
  • 始终在 @Test 结束时调用 driver.quit() 来关闭和销毁 WebDriverWeb Client实例优雅。
  • 这是您自己的代码块,其中包含上述调整:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC    
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\path\to\chromedriver.exe')
    driver.get('https://www.twitch.tv/dyrus/videos/all')
    link_elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.tw-interactive.tw-link[data-a-target='preview-card-image-link']")))
    for link in link_elements:
        print(link.get_attribute('href'))
    driver.quit()
    
  • 控制台输出:

    https://www.twitch.tv/videos/295314690
    https://www.twitch.tv/videos/294901947
    https://www.twitch.tv/videos/294472813
    https://www.twitch.tv/videos/294075254
    https://www.twitch.tv/videos/293617036
    https://www.twitch.tv/videos/293236560
    https://www.twitch.tv/videos/292800601
    https://www.twitch.tv/videos/292409437
    https://www.twitch.tv/videos/292328170
    https://www.twitch.tv/videos/292032996
    https://www.twitch.tv/videos/291625563
    https://www.twitch.tv/videos/291192151
    https://www.twitch.tv/videos/290824842
    https://www.twitch.tv/videos/290434348
    https://www.twitch.tv/videos/290021370
    https://www.twitch.tv/videos/289561690
    https://www.twitch.tv/videos/289495488
    https://www.twitch.tv/videos/289138003
    https://www.twitch.tv/videos/289110429
    https://www.twitch.tv/videos/288804893
    https://www.twitch.tv/videos/288784992
    https://www.twitch.tv/videos/288687479
    https://www.twitch.tv/videos/288432438
    https://www.twitch.tv/videos/288117849
    https://www.twitch.tv/videos/288004968
    https://www.twitch.tv/videos/287689102
    https://www.twitch.tv/videos/287451192
    https://www.twitch.tv/videos/287267032
    https://www.twitch.tv/videos/287017431
    https://www.twitch.tv/videos/286819343