如何使用 Selenium 和 Python 使用来自动态数据索引的值创建 DataFrame
How to create a DataFrame with values from dynamic data-index using Selenium and Python
我正在尝试抓取 https://www.livescore.com/en/ 但我遇到问题主要是因为该结构与我已经研究过的其他结构不同。
我看到有一个动态 ID 会在向下滚动页面时增加数字,代码中的 id 仅与页面上可见的匹配相关,然后代码内部的主队代码似乎相同与客队代码相比。
这是我尝试过的工作
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,30)
driver.get('https://www.livescore.com/en/football/live/')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#onetrust-accept-btn-handler"))).click()
games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
data1.append({
'Home':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
'Away':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
'Time':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRowTime_time__2Fkd2 MatchRowTime_isLive__2qWag"]').text
我们的想法是制作一个包含主队名称、客队名称和实际比赛时间的现场比赛数据框
有人可以帮助我吗?
据我所知,在元素内部定位元素的最清晰和最简单的方法是使用以点开头的 XPath .
Home
和 AWAY
团队名称以及比赛 Time
字段可以通过以下定位器清楚地找到:
games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
data1.append({
'Home':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_home")]').text,
'Away':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_away")]').text,
'Time':game1.find_element(By.XPATH, './/span[contains(@id,"match-row")]').text
创建 using Pandas with the Home Team Name and Away Team Name from the website you need to induce WebDriverWait for the and you can use the following :
使用CSS_SELECTOR:
driver.get('https://www.livescore.com/en/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
Home_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='home-team-name']")))]
Away_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='away-team-name']")))]
df = pd.DataFrame(data=list(zip(Home_team_name, Away_team_name)), columns=['Home Team Name', 'Home Team Name'])
print(df)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
控制台输出:
Home Team Name Home Team Name
0 Bayern Munich FC Salzburg
1 Liverpool Inter
2 FC Porto Lyon
3 Real Betis Eintracht Frankfurt
我正在尝试抓取 https://www.livescore.com/en/ 但我遇到问题主要是因为该结构与我已经研究过的其他结构不同。
我看到有一个动态 ID 会在向下滚动页面时增加数字,代码中的 id 仅与页面上可见的匹配相关,然后代码内部的主队代码似乎相同与客队代码相比。
这是我尝试过的工作
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,30)
driver.get('https://www.livescore.com/en/football/live/')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#onetrust-accept-btn-handler"))).click()
games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
data1.append({
'Home':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
'Away':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
'Time':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRowTime_time__2Fkd2 MatchRowTime_isLive__2qWag"]').text
我们的想法是制作一个包含主队名称、客队名称和实际比赛时间的现场比赛数据框
有人可以帮助我吗?
据我所知,在元素内部定位元素的最清晰和最简单的方法是使用以点开头的 XPath .
Home
和 AWAY
团队名称以及比赛 Time
字段可以通过以下定位器清楚地找到:
games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
data1.append({
'Home':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_home")]').text,
'Away':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_away")]').text,
'Time':game1.find_element(By.XPATH, './/span[contains(@id,"match-row")]').text
创建
使用CSS_SELECTOR:
driver.get('https://www.livescore.com/en/') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click() Home_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='home-team-name']")))] Away_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='away-team-name']")))] df = pd.DataFrame(data=list(zip(Home_team_name, Away_team_name)), columns=['Home Team Name', 'Home Team Name']) print(df)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
控制台输出:
Home Team Name Home Team Name 0 Bayern Munich FC Salzburg 1 Liverpool Inter 2 FC Porto Lyon 3 Real Betis Eintracht Frankfurt