无法使用硒从网站中定位元素

Can't locate elements from a website using selenium

试图从企业目录中抓取数据,但我一直找不到数据

name = 
driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text
# Results in: IndexError: list index out of range

所以我尝试使用 WebDriverWait 让代码等待数据加载但它找不到元素,即使数据已加载到网站。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time


url='https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b'
driver = webdriver.Firefox()
driver.get(url)

wait=WebDriverWait(driver,50)

wait.until(EC.visibility_of_element_located((By.CLASS_NAME,'searched-list ng-scope')))
name = driver.find_elements_by_xpath('/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text

print(name)
<iframe src="https://dmcc.secure.force.com/Business_directory_Page?initialWidth=987&amp;childId=pym-0&amp;parentTitle=List%20of%20Companies%20Registered%20in%20Dubai%2C%20DMCC%20Free%20Zone&amp;parentUrl=https%3A%2F%2Fwww.dmcc.ae%2Fbusiness-search%3Fdirectory%3D1%26submissionGuid%3D2c8df029-a92e-4b5d-a014-7ef9948e664b" width="100%" scrolling="no" marginheight="0" frameborder="0" height="3657px"></iframe>

切换到 iframe 并处理接受按钮。

driver.get('https://www.dmcc.ae/business-search?directory=1&submissionGuid=2c8df029-a92e-4b5d-a014-7ef9948e664b')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#hs-eu-confirmation-button"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,'#pym-0 > iframe')))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,'.searched-list.ng-scope')))
name = driver.find_elements_by_xpath('//*[@id="directory_list"]/div/div/div/div[1]/h4')[0]
print(name.text))

产出

1 BOXOFFICE DMCC
driver.switch_to.frame(driver.find_element_by_css_selector("#pym-0 iframe"))
wait = WebDriverWait(driver, 10)



wait.until(EC.presence_of_element_located(
    (By.CSS_SELECTOR, '.searched-list.ng-scope')))
name = driver.find_elements_by_xpath(
    '/html/body/div[3]/div/div/div[1]/div/div[1]/div/div[1]/h4')[0].text

它在 iframe 中,要与 iframe 元素交互,请先切换到它。这里 iframe 没有任何唯一标识。所以我们使用具有唯一 ID 的父 div 作为我们找到子 iframe

的参考

现在,如果您想在 iframe 之外进行交互,请使用;

 driver.switch_to.default_content()