使用 selenium 抓取网页数据

Question

你好我正在抓取这个页面https://www.betexplorer.com/soccer/china/super-league-2016/beijing-guoan-henan-jianye/S49KzkvO/我必须抓取这些数据

Country = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[1]/li[3]/a").text
leagueseason = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/header/h1/a").text
Home = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[1]/h2/a").text
Away = driver.find_element_by_xpath("/html/body/div[4]/div[4]/div/div/div[1]/section/ul[2]/li[3]/h2/a").text

我尝试使用这些 XPATH，但我会适应更具体的 XPath，因为这可能会发生变化。有什么建议吗？谢谢

Answer 1

要打印元素的 innerText，您必须归纳 for the visibility_of_element_located() and you can use either of the following :

使用 css-selectors 和 get_attribute("innerHTML"):

中国:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-breadcrumb li:nth-child(3) a"))).get_attribute("innerHTML"))

超级联赛 2016:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h1.wrap-section__header__title>a"))).get_attribute("innerHTML"))

北京国安:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:first-child h2.list-details__item__title>a"))).get_attribute("innerHTML"))

河南建业:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.list-details>li:nth-child(3) h2.list-details__item__title>a"))).get_attribute("innerHTML"))

使用 xpath 和 text 属性：

中国:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-breadcrumb']//following::li[3]//a"))).text)

超级联赛 2016:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='wrap-section__header__title']/a"))).text)

北京国安:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[1]//h2/a"))).text)

河南建业:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='list-details']//following::li[2]//h2/a"))).text)

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in

结尾

Link 到有用的文档：

get_attribute()方法Gets the given attribute or property of the element.
text属性returnsThe text of the element.
Difference between text and innerHTML using Selenium

使用 selenium 抓取网页数据

Web Scraping data with selenium

python

selenium

xpath

css-selectors

webdriverwait

结尾