python selenium 循环通过一些链接
python selenium loop through some links
我有一个 link 数组,我试图访问每个 link 并从中打印一些内容,然后 return 到主页并访问第二个 link,然后做同样的事情,直到我完成数组中的所有 links。
第一个 link 是唯一有效的,就像数组中的所有 link 都消失了一样。我收到错误:
File "e:\work\MY CODE\scraping\learn.py", line 25, in theprint link.click()
from selenium import webdriver
from selenium.webdriver.common import keys
#it make us able to use keybored keys like enter ,esc , etc....
from selenium.webdriver.common.keys import Keys
import time
#make us can wait for event to happen until run the next line of code
from selenium.webdriver.common.by import By
from selenium.webdriver.remote import command
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#get the google chrome driver path
PATH="E:\work\crom\chromedriver.exe"
#pass the pass to selenium webdriver method
driver=webdriver.Chrome(PATH)
#get the link of the site we want
driver.get("https://app.dealroom.co/companies.startups/f/client_focus/anyof_business/company_status/not_closed/company_type/not_government%20nonprofit/employees/anyof_2-10_11-50_51-200/has_website_url/anyof_yes/slug_locations/anyof_france?sort=-revenue")
#wait for the page to load
time.sleep(5)
#get the links i want to get info from
the_links=driver.find_elements_by_class_name("table-list-item")
#function that go the link and print somethin and return to main page
links=[]
the_links=driver.find_elements_by_class_name("table-list-item")
for link in the_links:
links.append(link.get_attribute('href'))
for link in links:
driver.get(link)
website=driver.find_element_by_class_name("item-details-info__url")
print(website.text)
driver.back()
time.sleep(3)
您的代码将抛出陈旧的元素引用错误,因为当您导航到下一页时,保存上一页的任何元素的变量将变得不可用。
所以您需要做的是将所有元素存储在数组中,然后像这样遍历它:
links=[]
the_links=driver.find_elements_by_class_name("table-list-item")
for link in the_links:
links.append(link.get_attribute('href'))
for link in links:
driver.get(link)
print("do something on this link")
或者您可以在当前 driver.back() 之后使用 while 循环再次填充 the_links 变量。
Karim,class_name "item-details-info__url" 的元素是否出现在所有页面上?另外,get() 方法抛出什么错误?
我有一个 link 数组,我试图访问每个 link 并从中打印一些内容,然后 return 到主页并访问第二个 link,然后做同样的事情,直到我完成数组中的所有 links。
第一个 link 是唯一有效的,就像数组中的所有 link 都消失了一样。我收到错误:
File "e:\work\MY CODE\scraping\learn.py", line 25, in theprint link.click()
from selenium import webdriver
from selenium.webdriver.common import keys
#it make us able to use keybored keys like enter ,esc , etc....
from selenium.webdriver.common.keys import Keys
import time
#make us can wait for event to happen until run the next line of code
from selenium.webdriver.common.by import By
from selenium.webdriver.remote import command
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#get the google chrome driver path
PATH="E:\work\crom\chromedriver.exe"
#pass the pass to selenium webdriver method
driver=webdriver.Chrome(PATH)
#get the link of the site we want
driver.get("https://app.dealroom.co/companies.startups/f/client_focus/anyof_business/company_status/not_closed/company_type/not_government%20nonprofit/employees/anyof_2-10_11-50_51-200/has_website_url/anyof_yes/slug_locations/anyof_france?sort=-revenue")
#wait for the page to load
time.sleep(5)
#get the links i want to get info from
the_links=driver.find_elements_by_class_name("table-list-item")
#function that go the link and print somethin and return to main page
links=[]
the_links=driver.find_elements_by_class_name("table-list-item")
for link in the_links:
links.append(link.get_attribute('href'))
for link in links:
driver.get(link)
website=driver.find_element_by_class_name("item-details-info__url")
print(website.text)
driver.back()
time.sleep(3)
您的代码将抛出陈旧的元素引用错误,因为当您导航到下一页时,保存上一页的任何元素的变量将变得不可用。
所以您需要做的是将所有元素存储在数组中,然后像这样遍历它:
links=[]
the_links=driver.find_elements_by_class_name("table-list-item")
for link in the_links:
links.append(link.get_attribute('href'))
for link in links:
driver.get(link)
print("do something on this link")
或者您可以在当前 driver.back() 之后使用 while 循环再次填充 the_links 变量。
Karim,class_name "item-details-info__url" 的元素是否出现在所有页面上?另外,get() 方法抛出什么错误?