在 Python 中勾选使用 Selenium webdriver 的复选框
Tick a checkbox using Selenium webdriver in Python
同学们,
我正在做一些网络抓取,需要从 www1.hkexnews.hk 网站下载多个 PDF。
但是,我在尝试让我的 Selenium chromedriver 勾选框 时遇到了一个问题,该框会在每次想要在上述网站上下载 PDF 时出现。代码执行,但框仍然显示为未单击。
请参考下面我的源代码 - 非常感谢任何建议!
driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
driver.implicitly_wait(10)
driver.maximize_window()
start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver.get(start_address)
PDF_link = driver.find_element_by_xpath("//a[contains(text(),'Full Version')]")
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()
checkbox = driver.find_element_by_id('warning-statement-accept')
print("Now clicking...", checkbox.text)
checkbox.click
编辑:谢谢你们!现在下载工作正常,只是一个小的后续问题 - 我如何修改下载代码以 根据公司名称保存每个 PDF - 通过 all_names = driver.find_elements_by_xpath("//div[@class='applicant-name']")
可用?
目前,我正在使用如下所示的自动下载选项,我想下载逻辑必须进行调整(我宁愿下载已经具有正确名称的 PDF,而不是采用使用的肮脏解决方法Python 保存后更改他们的名字...)
chrome_options.add_experimental_option('prefs', {
"download.default_directory": "/Users/XXX/Downloads", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
这里有几个问题:
- “复选框”定位器错误。
- 您当前的代码将只下载第一个 PDF 文件。
最好使用预期条件显式等待而不是隐式等待。
这应该会更好:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
wait = WebDriverWait(driver, 20)
driver.maximize_window()
start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver.get(start_address)
PDF_link = wait.until(EC.visibility_of_element_located((By.XPATH, "//a[contains(text(),'Full Version')]")))
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()
checkbox = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[./label[@for='warning-statement-accept']]//input")))
print("Now clicking...", checkbox.text)
checkbox.click
应该这样做:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get(link)
elem = wait.until(EC.presence_of_element_located((By.XPATH,"//tr[@class='record-ap-phip']//a[contains(.,'Full Version')]")))
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//label[@for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//a[contains(@class,'btn-ok')]"))).click()
这里是脚本的修改版本,它将踢出新打开的选项卡。我没有在脚本中包含下载逻辑。我想你可以自己做。
driver.get(link)
current = driver.current_window_handle
for elem in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//tr[@class='record-ap-phip']//a[contains(.,'Full Version')]"))):
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//label[@for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//a[contains(@class,'btn-ok')]"))).click()
wait.until(EC.new_window_is_opened)
driver.switch_to.window([window for window in driver.window_handles if window != current][0])
print(driver.current_url)
driver.close()
driver.switch_to.window(current)
driver.quit()
同学们,
我正在做一些网络抓取,需要从 www1.hkexnews.hk 网站下载多个 PDF。
但是,我在尝试让我的 Selenium chromedriver 勾选框 时遇到了一个问题,该框会在每次想要在上述网站上下载 PDF 时出现。代码执行,但框仍然显示为未单击。
请参考下面我的源代码 - 非常感谢任何建议!
driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
driver.implicitly_wait(10)
driver.maximize_window()
start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver.get(start_address)
PDF_link = driver.find_element_by_xpath("//a[contains(text(),'Full Version')]")
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()
checkbox = driver.find_element_by_id('warning-statement-accept')
print("Now clicking...", checkbox.text)
checkbox.click
编辑:谢谢你们!现在下载工作正常,只是一个小的后续问题 - 我如何修改下载代码以 根据公司名称保存每个 PDF - 通过 all_names = driver.find_elements_by_xpath("//div[@class='applicant-name']")
可用?
目前,我正在使用如下所示的自动下载选项,我想下载逻辑必须进行调整(我宁愿下载已经具有正确名称的 PDF,而不是采用使用的肮脏解决方法Python 保存后更改他们的名字...)
chrome_options.add_experimental_option('prefs', {
"download.default_directory": "/Users/XXX/Downloads", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
这里有几个问题:
- “复选框”定位器错误。
- 您当前的代码将只下载第一个 PDF 文件。
最好使用预期条件显式等待而不是隐式等待。
这应该会更好:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/chromedriver',options=chrome_options)
wait = WebDriverWait(driver, 20)
driver.maximize_window()
start_address = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver.get(start_address)
PDF_link = wait.until(EC.visibility_of_element_located((By.XPATH, "//a[contains(text(),'Full Version')]")))
print("Now clicking...'", PDF_link.text,"'")
PDF_link.click()
checkbox = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[./label[@for='warning-statement-accept']]//input")))
print("Now clicking...", checkbox.text)
checkbox.click
应该这样做:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www1.hkexnews.hk/app/appyearlyindex.html?lang=en&board=mainBoard&year=2021"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get(link)
elem = wait.until(EC.presence_of_element_located((By.XPATH,"//tr[@class='record-ap-phip']//a[contains(.,'Full Version')]")))
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//label[@for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//a[contains(@class,'btn-ok')]"))).click()
这里是脚本的修改版本,它将踢出新打开的选项卡。我没有在脚本中包含下载逻辑。我想你可以自己做。
driver.get(link)
current = driver.current_window_handle
for elem in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//tr[@class='record-ap-phip']//a[contains(.,'Full Version')]"))):
elem.click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//label[@for='warning-statement-accept']"))).click()
wait.until(EC.presence_of_element_located((By.XPATH,"//*[@id='warning-statement-dialog']//a[contains(@class,'btn-ok')]"))).click()
wait.until(EC.new_window_is_opened)
driver.switch_to.window([window for window in driver.window_handles if window != current][0])
print(driver.current_url)
driver.close()
driver.switch_to.window(current)
driver.quit()