使用 selenium python 从网站单击按钮后找到 URL
find the URL after button click from the website using selenium python
网站的每个按钮都可能包含link,下面的网站如何找到URL出现在下一个标签中。
想要在单击按钮后打印并抓取 URL
我正在使用 firefox 网络驱动程序
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
driver.find_element_by_xpath("//span[contains(text(),'Ingredients')]").click()
time.sleep(3)
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel™')]").click()
这个应该很简单,用driver.current_url
就可以了。所以用你的代码你可以试试
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
driver.find_element_by_xpath("//span[contains(text(),'Ingredients')]").click()
time.sleep(3)
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel™')]").click()
time.sleep(5)
driver.switch_to.window(driver.window_handles[1])
print(driver.current_url)
我发现了几个问题:
1 等待。摆脱 time.sleep()。将其替换为 explicit/implicit 等待。我观察到这些元素是最后加载到页面上的:picture[class='loaded']
。所以,我添加了等待他们。
2 要在选项卡之间切换,请使用:driver.switch_to.window(driver.window_handles[1])
、driver.switch_to.window(driver.window_handles[0])
- 切换到初始选项卡。
Chrome
的解决方案
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
# driver.implicitly_wait(10)
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "picture[class='loaded']")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".collapsed>a[title='Ingredients']")))
driver.find_element_by_css_selector(".collapsed>a[title='Ingredients']").click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Go to SmartLabel')]")))
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel')]").click()
driver.switch_to.window(driver.window_handles[1])
print(driver.current_url)
driver.close()
driver.switch_to.window(driver.window_handles[0])
print(driver.current_url)
输出:
https://smartlabel.unileverusa.com/011111375512-0001-en-US/index.html
https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html
对于 Firefox 你需要等待第二页上的至少一个元素,否则输出不会给你预期的 link:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "picture[class='loaded']")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".collapsed>a[title='Ingredients']")))
driver.find_element_by_css_selector(".collapsed>a[title='Ingredients']").click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Go to SmartLabel')]")))
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel')]").click()
driver.switch_to.window(driver.window_handles[1])
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".container-fluid.content-section")))
print(driver.current_url)
driver.close()
driver.switch_to.window(driver.window_handles[0])
print(driver.current_url)
P.S。如果您正在寻找一种通过属性名称查找 links 的方法,那是不可能的,因为此按钮没有这样的功能。生成了link。
网站的每个按钮都可能包含link,下面的网站如何找到URL出现在下一个标签中。
想要在单击按钮后打印并抓取 URL 我正在使用 firefox 网络驱动程序
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
driver.find_element_by_xpath("//span[contains(text(),'Ingredients')]").click()
time.sleep(3)
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel™')]").click()
这个应该很简单,用driver.current_url
就可以了。所以用你的代码你可以试试
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
driver.find_element_by_xpath("//span[contains(text(),'Ingredients')]").click()
time.sleep(3)
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel™')]").click()
time.sleep(5)
driver.switch_to.window(driver.window_handles[1])
print(driver.current_url)
我发现了几个问题:
1 等待。摆脱 time.sleep()。将其替换为 explicit/implicit 等待。我观察到这些元素是最后加载到页面上的:picture[class='loaded']
。所以,我添加了等待他们。
2 要在选项卡之间切换,请使用:driver.switch_to.window(driver.window_handles[1])
、driver.switch_to.window(driver.window_handles[0])
- 切换到初始选项卡。
Chrome
的解决方案from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
# driver.implicitly_wait(10)
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "picture[class='loaded']")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".collapsed>a[title='Ingredients']")))
driver.find_element_by_css_selector(".collapsed>a[title='Ingredients']").click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Go to SmartLabel')]")))
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel')]").click()
driver.switch_to.window(driver.window_handles[1])
print(driver.current_url)
driver.close()
driver.switch_to.window(driver.window_handles[0])
print(driver.current_url)
输出:
https://smartlabel.unileverusa.com/011111375512-0001-en-US/index.html
https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html
对于 Firefox 你需要等待第二页上的至少一个元素,否则输出不会给你预期的 link:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "picture[class='loaded']")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".collapsed>a[title='Ingredients']")))
driver.find_element_by_css_selector(".collapsed>a[title='Ingredients']").click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Go to SmartLabel')]")))
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel')]").click()
driver.switch_to.window(driver.window_handles[1])
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".container-fluid.content-section")))
print(driver.current_url)
driver.close()
driver.switch_to.window(driver.window_handles[0])
print(driver.current_url)
P.S。如果您正在寻找一种通过属性名称查找 links 的方法,那是不可能的,因为此按钮没有这样的功能。生成了link。