使用 selenium 抓取网页数据
Web Scraping data with selenium
你好我正在抓取这个页面https://www.betexplorer.com/soccer/china/super-league-2016/beijing-guoan-henan-jianye/S49KzkvO/我必须抓取这些数据
Country = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[1]/li[3]/a").text
leagueseason = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/header/h1/a").text
Home = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[1]/h2/a").text
Away = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[3]/h2/a").text
我尝试使用这些 XPATH,但我会适应更具体的 XPath,因为这可能会发生变化。有什么建议吗?谢谢
要打印元素的 innerText
,您必须归纳 for the visibility_of_element_located()
and you can use either of the following :
使用 css-selectors 和 get_attribute("innerHTML")
:
中国:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-breadcrumb li:nth-child(3) a"))).get_attribute("innerHTML"))
超级联赛 2016:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1.wrap-section__header__title>a"))).get_attribute("innerHTML"))
北京国安:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:first-child h2.list-details__item__title>a"))).get_attribute("innerHTML"))
河南建业:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:nth-child(3) h2.list-details__item__title>a"))).get_attribute("innerHTML"))
使用 xpath 和 text 属性:
中国:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-breadcrumb']//following::li[3]//a"))).text)
超级联赛 2016:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='wrap-section__header__title']/a"))).text)
北京国安:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[1]//h2/a"))).text)
河南建业:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[2]//h2/a"))).text)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
结尾
Link 到有用的文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium
你好我正在抓取这个页面https://www.betexplorer.com/soccer/china/super-league-2016/beijing-guoan-henan-jianye/S49KzkvO/我必须抓取这些数据
Country = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[1]/li[3]/a").text
leagueseason = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/header/h1/a").text
Home = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[1]/h2/a").text
Away = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[3]/h2/a").text
我尝试使用这些 XPATH,但我会适应更具体的 XPath,因为这可能会发生变化。有什么建议吗?谢谢
要打印元素的 innerText
,您必须归纳 visibility_of_element_located()
and you can use either of the following
使用 css-selectors 和
get_attribute("innerHTML")
:中国:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-breadcrumb li:nth-child(3) a"))).get_attribute("innerHTML"))
超级联赛 2016:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1.wrap-section__header__title>a"))).get_attribute("innerHTML"))
北京国安:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:first-child h2.list-details__item__title>a"))).get_attribute("innerHTML"))
河南建业:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:nth-child(3) h2.list-details__item__title>a"))).get_attribute("innerHTML"))
使用 xpath 和 text 属性:
中国:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-breadcrumb']//following::li[3]//a"))).text)
超级联赛 2016:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='wrap-section__header__title']/a"))).text)
北京国安:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[1]//h2/a"))).text)
河南建业:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[2]//h2/a"))).text)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
结尾
Link 到有用的文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium