如何使用 selenium / beautifulsoup 获取悬停信息?

How to get hover-information with selenium / beautifulsoup?

我想从这个网站上的悬停信息中获取数据: https://www.pferdewetten.de/race/17350803

当我悬停时,例如。在第一个启动器“Jumby Bay”上,我得到了这个悬停信息:

当我检查代码时,我看不到任何信息? 有什么方法可以使用 selenium / beautiful soup 来获取这些信息吗?

您只需使用 ActionsChain 将鼠标悬停在名称上,然后就可以从工具提示中提取文本。

代码:

driver.maximize_window()
wait = WebDriverWait(driver, 30)

driver.get("https://www.pferdewetten.de/race/17350803")

ActionChains(driver).move_to_element(wait.until(EC.visibility_of_element_located((By.XPATH, "//span[starts-with(@class,'ParticipantInfoItem_info_horseName')]")))).perform()
print(wait.until(EC.presence_of_element_located((By.XPATH, "//div[starts-with(@class,'ParticipantInfoItem_infoContainer--')]//div[starts-with(@class,'Tooltip')]"))).get_attribute('innerText'))

进口:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

输出:

Jumby Bay (2019)
Herkunft:
Frankreich
Vater:
Uriel Speed
Mutter:
Norvege
Besitzer:
Ecurie A.B Racing

Process finished with exit code 0

PS: 为了将鼠标悬停在每个名称上,您应该首先将它们放入列表中,然后在循环中理想地执行悬停和并以同样的方式提取文本。

更新:

driver.maximize_window()
driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)

driver.get("https://www.pferdewetten.de/race/17350803")

names = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//li[starts-with(@class,'ListDivider_item--')]")))
print(len(names))
i = 1
for name in names:
    try:
        #wait.until(EC.visibility_of(name))
        time.sleep(2)
        ActionChains(driver).move_to_element(wait.until(EC.visibility_of_element_located((By.XPATH, f"(//span[starts-with(@class,'ParticipantInfoItem_info_horseName')])[{i}]")))).pause(2).perform()
        print(wait.until(EC.visibility_of_element_located((By.XPATH, "//div[starts-with(@class,'ParticipantInfoItem_infoContainer--')]//div[starts-with(@class,'Tooltip')]"))).get_attribute('innerText'))
        i = i + 1
    except:
        pass

正如@Sayse 所说,所需的数据也是从 api 调用 json 响应作为 GET 方法动态加载的,您也可以仅使用 requests 模块和您轻松获取数据必须添加身份验证密钥作为 header,发送到请求 Headers 中的 api 响应。

import requests
headers={"Authorization2": "Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJodHRwczpcL1wvd3d3LnBmZXJkZXdldHRlbi5kZVwvIiwiYXVkIjoiaHR0cHM6XC9cL3d3dy5wZmVyZGV3ZXR0ZW4uZGVcLyIsImlhdCI6MTY1MDEyODM5OCwiZXhwIjoxNjUwMTMwMTk4LCJpcCI6IjM3LjExMS4yMDUuMTQ0IiwiY28iOiJQRlciLCJjdHkiOiJCRCIsImxuZyI6Imdlcm1hbiIsImNhblJlZ2lzdGVyIjp0cnVlfQ.NQr1B6rVx5T39Zm_78959KX0bNufzWGNDQ7_Bq_dMFI"}
api_url = "https://www.pferdewetten.de/data/racecard/get/17350803"
jsonData=requests.get(api_url,headers=headers).json()

for horse in jsonData['data']['participants']:
    horse_name=horse['horse_name']
    print(horse_name)
    

输出:

Jumby Bay
Juliana Filo
Jade De Bertrange
Jaya Du Bessy
Jenny Gold
Jeny de Gouye
Jabelone
Jenesys Vallee
Jarny De Bertrange
Jordana Du Fer
Jazzy Du Liamone
Jismie Griff
Jelfa
Just Beautiful