从 svg 交互式地图获取链接
Get links from a svg interactive map
我正在使用 webdriver.Chrome (Selenium),以获取存储在 svg 交互式地图下的链接,位于以下 URL:https://www.mpcb.gov.in/water-quality
我曾尝试使用 find_elements_by_ 的不同功能,如 xpath、partial_text 等,但没有成功。这是我的代码的开头:
DRIVER_PATH = 'C:/Users/Asaf/Desktop/Asaf/Python/chromedriver.exe'
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
# Site URL
url='https://www.mpcb.gov.in/water-quality'
driver.get(url)
在上下文中,我想获取这些链接(地图的可点击区域)以便一次从中抓取数据。
从 svg interactive map, open the links in a new to scrape data from them, one at a time you can use the following 获取链接:
代码块:
driver.get('https://www.mpcb.gov.in/water-quality')
hrefs = [my_elem.get_attribute("xlink:href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[property='schema:text'] svg g a")))]
parent_window = driver.current_window_handle
for href in hrefs:
driver.execute_script("window.open('" + href +"')")
WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
windows_after = driver.window_handles
new_window = [x for x in windows_after if x != parent_window][0]
driver.switch_to_window(new_window)
print(driver.current_url)
driver.close()
driver.switch_to_window(parent_window)
driver.quit()
控制台输出:
https://www.mpcb.gov.in/water-quality/Thane/15
https://www.mpcb.gov.in/water-quality/Nashik/18
https://www.mpcb.gov.in/water-quality/Aurangabad/19
https://www.mpcb.gov.in/water-quality/Amravati/21
https://www.mpcb.gov.in/water-quality/Raigad/14
https://www.mpcb.gov.in/water-quality/Pune/17
https://www.mpcb.gov.in/water-quality/Nagpur/20
https://www.mpcb.gov.in/water-quality/Chandrapur/23
https://www.mpcb.gov.in/water-quality/Kolhapur/22
https://www.mpcb.gov.in/water-quality/Mumbai/12
参考资料
您可以在以下位置找到一些相关的详细讨论:
- Creating XPATH for svg tag
我正在使用 webdriver.Chrome (Selenium),以获取存储在 svg 交互式地图下的链接,位于以下 URL:https://www.mpcb.gov.in/water-quality
我曾尝试使用 find_elements_by_ 的不同功能,如 xpath、partial_text 等,但没有成功。这是我的代码的开头:
DRIVER_PATH = 'C:/Users/Asaf/Desktop/Asaf/Python/chromedriver.exe'
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
# Site URL
url='https://www.mpcb.gov.in/water-quality'
driver.get(url)
在上下文中,我想获取这些链接(地图的可点击区域)以便一次从中抓取数据。
从 svg interactive map, open the links in a new
代码块:
driver.get('https://www.mpcb.gov.in/water-quality') hrefs = [my_elem.get_attribute("xlink:href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[property='schema:text'] svg g a")))] parent_window = driver.current_window_handle for href in hrefs: driver.execute_script("window.open('" + href +"')") WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2)) windows_after = driver.window_handles new_window = [x for x in windows_after if x != parent_window][0] driver.switch_to_window(new_window) print(driver.current_url) driver.close() driver.switch_to_window(parent_window) driver.quit()
控制台输出:
https://www.mpcb.gov.in/water-quality/Thane/15 https://www.mpcb.gov.in/water-quality/Nashik/18 https://www.mpcb.gov.in/water-quality/Aurangabad/19 https://www.mpcb.gov.in/water-quality/Amravati/21 https://www.mpcb.gov.in/water-quality/Raigad/14 https://www.mpcb.gov.in/water-quality/Pune/17 https://www.mpcb.gov.in/water-quality/Nagpur/20 https://www.mpcb.gov.in/water-quality/Chandrapur/23 https://www.mpcb.gov.in/water-quality/Kolhapur/22 https://www.mpcb.gov.in/water-quality/Mumbai/12
参考资料
您可以在以下位置找到一些相关的详细讨论:
- Creating XPATH for svg tag