如何通过selenium和python找到twitch上视频的href属性?
How to find the href attribute of the videos on twitch through selenium and python?
我正在尝试查找特定用户的所有视频的 twitch 视频 ID。因此,例如在此页面上
https://www.twitch.tv/dyrus/videos/all
所以我们在这里链接了所有视频,但它并不像抓取 html 并找到链接那么简单,因为它们似乎是动态生成的。
所以我听说了 selenium 并做了这样的事情:
from selenium import webdriver
# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver')
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_element = driver.find_elements_by_xpath("//*[@href]")
for link in link_element:
print(link.get_attribute('href'))
driver.close()
这个 returns 我在页面上有一堆链接但不是视频,它们在说谎 "deeper" 我想,有什么输入吗?
提前致谢
使用定位器,您将返回页面上包含 href
属性的每个元素。你可以比这更具体一点,得到你正在寻找的东西。切换到 CSS 选择器...
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver')
driver.get('https://www.twitch.tv/dyrus/videos/all')
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-a-target='preview-card-image-link']")))
for link in links:
print(link.get_attribute('href'))
driver.close()
从页面打印 40 个链接。
我仍然建议进行如下一些更改:
- 始终以 最大化 模式打开 Web 浏览器,以便 all/majority 个所需元素位于 [=16] =].
- 如果您在 Windows OS 您需要在 的末尾添加扩展名
.exe
WebDriver 变体名称,例如chromedriver.exe
- 当您识别元素时,请始终尝试在您的 定位器策略.
中包含 class
属性
- 始终在
@Test
结束时调用 driver.quit()
来关闭和销毁 WebDriver 和 Web Client实例优雅。
这是您自己的代码块,其中包含上述调整:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\path\to\chromedriver.exe')
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.tw-interactive.tw-link[data-a-target='preview-card-image-link']")))
for link in link_elements:
print(link.get_attribute('href'))
driver.quit()
控制台输出:
https://www.twitch.tv/videos/295314690
https://www.twitch.tv/videos/294901947
https://www.twitch.tv/videos/294472813
https://www.twitch.tv/videos/294075254
https://www.twitch.tv/videos/293617036
https://www.twitch.tv/videos/293236560
https://www.twitch.tv/videos/292800601
https://www.twitch.tv/videos/292409437
https://www.twitch.tv/videos/292328170
https://www.twitch.tv/videos/292032996
https://www.twitch.tv/videos/291625563
https://www.twitch.tv/videos/291192151
https://www.twitch.tv/videos/290824842
https://www.twitch.tv/videos/290434348
https://www.twitch.tv/videos/290021370
https://www.twitch.tv/videos/289561690
https://www.twitch.tv/videos/289495488
https://www.twitch.tv/videos/289138003
https://www.twitch.tv/videos/289110429
https://www.twitch.tv/videos/288804893
https://www.twitch.tv/videos/288784992
https://www.twitch.tv/videos/288687479
https://www.twitch.tv/videos/288432438
https://www.twitch.tv/videos/288117849
https://www.twitch.tv/videos/288004968
https://www.twitch.tv/videos/287689102
https://www.twitch.tv/videos/287451192
https://www.twitch.tv/videos/287267032
https://www.twitch.tv/videos/287017431
https://www.twitch.tv/videos/286819343
我正在尝试查找特定用户的所有视频的 twitch 视频 ID。因此,例如在此页面上 https://www.twitch.tv/dyrus/videos/all
所以我们在这里链接了所有视频,但它并不像抓取 html 并找到链接那么简单,因为它们似乎是动态生成的。
所以我听说了 selenium 并做了这样的事情:
from selenium import webdriver
# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver')
driver.get('https://www.twitch.tv/dyrus/videos/all')
link_element = driver.find_elements_by_xpath("//*[@href]")
for link in link_element:
print(link.get_attribute('href'))
driver.close()
这个 returns 我在页面上有一堆链接但不是视频,它们在说谎 "deeper" 我想,有什么输入吗?
提前致谢
使用定位器,您将返回页面上包含 href
属性的每个元素。你可以比这更具体一点,得到你正在寻找的东西。切换到 CSS 选择器...
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Change path here obviously
driver = webdriver.Chrome('C:/Users/Jason/Downloads/chromedriver')
driver.get('https://www.twitch.tv/dyrus/videos/all')
links = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[data-a-target='preview-card-image-link']")))
for link in links:
print(link.get_attribute('href'))
driver.close()
从页面打印 40 个链接。
我仍然建议进行如下一些更改:
- 始终以 最大化 模式打开 Web 浏览器,以便 all/majority 个所需元素位于 [=16] =].
- 如果您在 Windows OS 您需要在 的末尾添加扩展名
.exe
WebDriver 变体名称,例如chromedriver.exe - 当您识别元素时,请始终尝试在您的 定位器策略. 中包含
- 始终在
@Test
结束时调用driver.quit()
来关闭和销毁 WebDriver 和 Web Client实例优雅。 这是您自己的代码块,其中包含上述调整:
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = Options() options.add_argument("start-maximized") options.add_argument("disable-infobars") driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\path\to\chromedriver.exe') driver.get('https://www.twitch.tv/dyrus/videos/all') link_elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.tw-interactive.tw-link[data-a-target='preview-card-image-link']"))) for link in link_elements: print(link.get_attribute('href')) driver.quit()
控制台输出:
https://www.twitch.tv/videos/295314690 https://www.twitch.tv/videos/294901947 https://www.twitch.tv/videos/294472813 https://www.twitch.tv/videos/294075254 https://www.twitch.tv/videos/293617036 https://www.twitch.tv/videos/293236560 https://www.twitch.tv/videos/292800601 https://www.twitch.tv/videos/292409437 https://www.twitch.tv/videos/292328170 https://www.twitch.tv/videos/292032996 https://www.twitch.tv/videos/291625563 https://www.twitch.tv/videos/291192151 https://www.twitch.tv/videos/290824842 https://www.twitch.tv/videos/290434348 https://www.twitch.tv/videos/290021370 https://www.twitch.tv/videos/289561690 https://www.twitch.tv/videos/289495488 https://www.twitch.tv/videos/289138003 https://www.twitch.tv/videos/289110429 https://www.twitch.tv/videos/288804893 https://www.twitch.tv/videos/288784992 https://www.twitch.tv/videos/288687479 https://www.twitch.tv/videos/288432438 https://www.twitch.tv/videos/288117849 https://www.twitch.tv/videos/288004968 https://www.twitch.tv/videos/287689102 https://www.twitch.tv/videos/287451192 https://www.twitch.tv/videos/287267032 https://www.twitch.tv/videos/287017431 https://www.twitch.tv/videos/286819343
class
属性