Python Selenium find_elements 在多个页面中始终给出相同的结果
Python Selenium find_elements in multiple pages give the same result all the time
我有一个简单的问题示例。我 运行 driver.find_elements(By.ID, "缩略图")
它有效,我点击一个随机元素,然后在循环中再次重新抓取信息,第二次,我总是得到完全相同的结果:
driver.get("https://www.somepage.com")
time.sleep(7)
items = []
for i in range(3):
print("LOOP #: " + str(i))
random_number = random.randint(1, 5)
items = driver.find_elements(By.ID, "thumbnail")
url = i.get_attribute("href")
print(str(url))
items[random_number].click()
time.sleep(100)
输出
LOOP #: 0
URL 1
URL 2
URL 3
URL 4
LOOP #: 1
URL 1
URL 2
URL 3
URL 4
LOOP #: 2
URL 1
URL 2
URL 3
URL 4
第二个循环应该有不同的URL。 find_elements(By.ID, "thumbnail")
仍然适用
我不知道我做错了什么。
我什至尝试在循环末尾添加 items.clear()
,结果相同。
以下答案属于 YouTube
,因为它是在被询问时作为示例给出的。
当 YouTube
打开时,它会有 thumbnail
id,并且有很多这些缩略图。所以策略是在 3 的范围内迭代,在那个循环中,对于每次迭代,收集所有具有 id thumbnail
和 select 的元素,一个随机的并获取它是 href
和然后点击它。现在的问题是如何重申:有 2 个选项:(1) 继续单击并从左窗格中选择 select 选项之一(我想是 thumbnail
),或者,(2)单击主页(YouTube
图标),然后再次继续迭代过程。
我选择了第二个选项,这里是它的代码:
driver.get('https://www.youtube.com/')
for i in range(3):
print("LOOP #: " + str(i))
time.sleep(10)
items = driver.find_elements(By.ID, "thumbnail")
# here, instead of selecting from the items, you are trying to fetch the attribute from i, which is not an element at all and it didn't work for me.
# I , instead, fetched the href from items stored it in a variable, and clicked on it, then clicked on homepage and reiterated the process
rand = random.choice(items)
print(rand.get_attribute('href'))
rand.click()
time.sleep(3)
driver.find_element(By.XPATH, "(//*[@title='YouTube Home'])[1]").click()
driver.quit()
输出:
LOOP #: 0
https://www.youtube.com/watch?v=YIKz49-aGas
LOOP #: 1
https://www.youtube.com/watch?v=51Qs0Ej2RUc
LOOP #: 2
https://www.youtube.com/watch?v=OeShsZPOP-s
Process finished with exit code 0
注意:如果您希望获得健壮的代码,可以将 time.sleep
替换为更好的 explicit wait
,例如 webdriverwait
。话虽如此,YouTube 是 Google 属性,在元素属性方面会有很大的随机性,并且经常变得不稳定。此外,如果请求过多,机器人将被检测到。
更新的答案:
点击首页缩略图后点击右侧窗格缩略图的更新答案
driver.maximize_window()
driver.get('https://www.youtube.com/')
time.sleep(10)
items = driver.find_elements(By.ID, "thumbnail")
rand = random.choice(items)
print(rand.get_attribute('href'))
rand.click()
time.sleep(3)
for i in range(3):
print(f"Loop#: {str(i)}")
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "movie_player")))
yt_left_pane_items = driver.find_elements(By.XPATH, "//*[@id='items']//*[@id='thumbnail']")
rand_left_pane = random.choice(yt_left_pane_items)
print(rand_left_pane.get_attribute('href'))
rand_left_pane.click()
time.sleep(5)
driver.quit()
进口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
输出:
https://www.youtube.com/watch?v=9YSbflKeOZQ
Loop#: 0
https://www.youtube.com/watch?v=ENOEgKeI_D0
Loop#: 1
https://www.youtube.com/watch?v=PeByUAhHXqs
Loop#: 2
https://www.youtube.com/watch?v=GUHfY84weMw
Process finished with exit code 0
我有一个简单的问题示例。我 运行 driver.find_elements(By.ID, "缩略图") 它有效,我点击一个随机元素,然后在循环中再次重新抓取信息,第二次,我总是得到完全相同的结果:
driver.get("https://www.somepage.com")
time.sleep(7)
items = []
for i in range(3):
print("LOOP #: " + str(i))
random_number = random.randint(1, 5)
items = driver.find_elements(By.ID, "thumbnail")
url = i.get_attribute("href")
print(str(url))
items[random_number].click()
time.sleep(100)
输出
LOOP #: 0
URL 1
URL 2
URL 3
URL 4
LOOP #: 1
URL 1
URL 2
URL 3
URL 4
LOOP #: 2
URL 1
URL 2
URL 3
URL 4
第二个循环应该有不同的URL。 find_elements(By.ID, "thumbnail")
仍然适用
我不知道我做错了什么。
我什至尝试在循环末尾添加 items.clear()
,结果相同。
以下答案属于 YouTube
,因为它是在被询问时作为示例给出的。
当 YouTube
打开时,它会有 thumbnail
id,并且有很多这些缩略图。所以策略是在 3 的范围内迭代,在那个循环中,对于每次迭代,收集所有具有 id thumbnail
和 select 的元素,一个随机的并获取它是 href
和然后点击它。现在的问题是如何重申:有 2 个选项:(1) 继续单击并从左窗格中选择 select 选项之一(我想是 thumbnail
),或者,(2)单击主页(YouTube
图标),然后再次继续迭代过程。
我选择了第二个选项,这里是它的代码:
driver.get('https://www.youtube.com/')
for i in range(3):
print("LOOP #: " + str(i))
time.sleep(10)
items = driver.find_elements(By.ID, "thumbnail")
# here, instead of selecting from the items, you are trying to fetch the attribute from i, which is not an element at all and it didn't work for me.
# I , instead, fetched the href from items stored it in a variable, and clicked on it, then clicked on homepage and reiterated the process
rand = random.choice(items)
print(rand.get_attribute('href'))
rand.click()
time.sleep(3)
driver.find_element(By.XPATH, "(//*[@title='YouTube Home'])[1]").click()
driver.quit()
输出:
LOOP #: 0
https://www.youtube.com/watch?v=YIKz49-aGas
LOOP #: 1
https://www.youtube.com/watch?v=51Qs0Ej2RUc
LOOP #: 2
https://www.youtube.com/watch?v=OeShsZPOP-s
Process finished with exit code 0
注意:如果您希望获得健壮的代码,可以将 time.sleep
替换为更好的 explicit wait
,例如 webdriverwait
。话虽如此,YouTube 是 Google 属性,在元素属性方面会有很大的随机性,并且经常变得不稳定。此外,如果请求过多,机器人将被检测到。
更新的答案:
点击首页缩略图后点击右侧窗格缩略图的更新答案
driver.maximize_window()
driver.get('https://www.youtube.com/')
time.sleep(10)
items = driver.find_elements(By.ID, "thumbnail")
rand = random.choice(items)
print(rand.get_attribute('href'))
rand.click()
time.sleep(3)
for i in range(3):
print(f"Loop#: {str(i)}")
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "movie_player")))
yt_left_pane_items = driver.find_elements(By.XPATH, "//*[@id='items']//*[@id='thumbnail']")
rand_left_pane = random.choice(yt_left_pane_items)
print(rand_left_pane.get_attribute('href'))
rand_left_pane.click()
time.sleep(5)
driver.quit()
进口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
输出:
https://www.youtube.com/watch?v=9YSbflKeOZQ
Loop#: 0
https://www.youtube.com/watch?v=ENOEgKeI_D0
Loop#: 1
https://www.youtube.com/watch?v=PeByUAhHXqs
Loop#: 2
https://www.youtube.com/watch?v=GUHfY84weMw
Process finished with exit code 0