如何使用 Selenium 和 Python 使用来自动态数据索引的值创建 DataFrame

How to create a DataFrame with values from dynamic data-index using Selenium and Python

我正在尝试抓取 https://www.livescore.com/en/ 但我遇到问题主要是因为该结构与我已经研究过的其他结构不同。

我看到有一个动态 ID 会在向下滚动页面时增加数字,代码中的 id 仅与页面上可见的匹配相关,然后代码内部的主队代码似乎相同与客队代码相比。

这是我尝试过的工作

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path=r"C:\Users\Lorenzo\Downloads\chromedriver.exe")
driver.maximize_window()
wait=WebDriverWait(driver,30)
driver.get('https://www.livescore.com/en/football/live/')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#onetrust-accept-btn-handler"))).click()


games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
    data1.append({
        'Home':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
        'Away':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRow_teamName__2cw5n"]').text,
        'Time':game1.find_element(By.CSS_SELECTOR, 'div[class = "MatchRowTime_time__2Fkd2 MatchRowTime_isLive__2qWag"]').text

我们的想法是制作一个包含主队名称、客队名称和实际比赛时间的现场比赛数据框

有人可以帮助我吗?

据我所知,在元素内部定位元素的最清晰和最简单的方法是使用以点开头的 XPath .
HomeAWAY 团队名称以及比赛 Time 字段可以通过以下定位器清楚地找到:

games1 = driver.find_elements(By.CSS_SELECTOR, 'div[class = "MatchRow_matchRowWrapper__1BtJ3"]')
data1 = []
for game1 in games1:
    data1.append({
        'Home':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_home")]').text,
        'Away':game1.find_element(By.XPATH, './/div[contains(@class,"MatchRow_away")]').text,
        'Time':game1.find_element(By.XPATH, './/span[contains(@id,"match-row")]').text

创建 using Pandas with the Home Team Name and Away Team Name from the website you need to induce WebDriverWait for the and you can use the following :

  • 使用CSS_SELECTOR:

    driver.get('https://www.livescore.com/en/')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
    Home_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='home-team-name']")))]
    Away_team_name = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[id$='away-team-name']")))]
    df = pd.DataFrame(data=list(zip(Home_team_name, Away_team_name)), columns=['Home Team Name', 'Home Team Name'])
    print(df)
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • 控制台输出:

      Home Team Name       Home Team Name
    0  Bayern Munich          FC Salzburg
    1      Liverpool                Inter
    2       FC Porto                 Lyon
    3     Real Betis  Eintracht Frankfurt