计算某些 class 的实例数并通过 Selenium 获取值

Question

我正在尝试从 this website 抓取数据。

更具体地说，我希望我的脚本计算 table 中的行数并从每一行中提取出席人数（请参阅下面的图片。）

检查该网站，我在第一行（12 月 1 日）看到以下日期：

<td ng-repeat="(k,h) in sec.headers track by $index" class="date ng-scope" data-high="false" data-hidden="false" 
ng-style="{'text-align':h.properties.align}" ng-bind-html="vals | getColData:[k]:language:seasonId" 
compile-table-col="" style="text-align: left;"><span>Dec. 1</span></td>

然后，我在第一行 (872) 中看到了人群计数代码块

<td ng-repeat="(k,h) in sec.headers track by $index" class="attendance ng-scope" data-high="false" 
data-hidden="false" ng-style="{'text-align':h.properties.align}" ng-bind-html="vals | getColData:[k]:language:seasonId" 
compile-table-col="" style="text-align: right;"><span>872</span></td>

我试过 driver.find_elements_by_class_name 的多个版本，例如

elements = driver.find_elements_by_class_name("date ng-scope")

和

driver.find_elements_by_xpath("//td[@class='date ng-scope']")))

不幸的是，其中 none 个有效。

有人能给我指出正确的方向吗？如果有人可以提供有关如何通过计算 'date ng-scope' 的实例数并提取相应的人群计数来正确计算行数的建议。

Answer 1

因为它是一个 table，所以很容易实现，因为您需要做的就是不断地将 table 的值递增 1。我就是这样做的：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep, strftime

url = "https://www.ushl.com/view#/schedule/24/67/12/home?league=1&gametype=-1"

webdriver = webdriver.Chrome()
webdriver.get(url)

x = 0
i = 2

while x == 0:
    try:
        date = webdriver.find_element_by_xpath(f"/html/body/div[5]/div[1]/div[4]/div[2]/div[1]/div/div[3]/div/div/div/div/div/div/ng-view/div[2]/div[3]/div[1]/table/tbody/tr[{i}]/td[1]/span").text
        attendance = webdriver.find_elements_by_xpath(f"/html/body/div[5]/div[1]/div[4]/div[2]/div[1]/div/div[3]/div/div/div/div/div/div/ng-view/div[2]/div[3]/div[1]/table/tbody/tr[{i}]/td[8]/span")[0].text

        print(f"Attendance Of {attendance} On Date {date}")
        i += 1
    except:
        x = 1
        break

让我解释一下：

第 1 - 3 行导入必要的模块，例如 selenium。

第 4 行将 url 设置为字符串。

第 5 行将 webdriver 定义为 Chrome。

第 6 行使用 Chrome 打开我们之前定义的 url。

第 7 行将 x 定义为 0。对于后面的 while 循环，我们需要 x 为 0。

第 8 行将 i 定义为 2，我们稍后 table 需要它。

第 9 行启动一个 while 循环，只要 x 为 0（我们之前将其设置为 0），该循环就会运行。

第 10 行启动一个 try 命令。您稍后会明白我们为什么需要它。

第 11 行将日期设置为 xpath 变量的文本。我以前用过html，所以我大致知道table系统是如何工作的。 tr 代表 table 行。第一个日期，12 月 1 日是 table 行 2。我们之前将 i 设置为 2，因此我们可以使用 tr[{i}], 来表示 2.

第 12 行做完全相同的事情，但对于出勤，仍然使用 i，因为它是 table。我在末尾添加了 [0]，因为出勤的 xpath 是一个列表。尽管我很确定没有列表，但 selenium 仍然这么认为，所以我决定使用 [0] 来获取列表的第一个元素。没有秒或第三个元素，因此 [1] 或 [2] 将不起作用。

第13行打印用户信息。第 14 行将 i 递增 1，因为在下一个循环中，我们需要访问第 3 table 行，因此 i += 1 将 i 设置为 3.

我们继续运行直到没有更多的 table 行。发生这种情况时，我们使用第 15 行的 try 命令来打破 while 循环。

Answer 2

桌子很有趣。我发现最好从外面钻进去，而不是先直接进入你想要的元素。比如行数。

driver.findElements("//div[contains(@class,'table-container')]//tr")

将 return 一个元素列表，获取该列表的大小可以得到行数（这包括 header 行，所以如果你想要实际的游戏数减去 1 ).翻译后的 xpath 表达式是“找到任何 div 元素，其中 class 名称包含字符串“table-container”，并且在该元素的下游，任何 tr 元素

可以使用此 xpath 找到出勤字段：

//div[contains(@class,'table-container')]//tr[2]/td[contains(@class,'attendance')]/span

其中 tr[2] 表示第二行。以编程方式，使“[2]”成为一个变量并将循环索引替换为 2 并遍历行计数。

Answer 3

使用WebDriverWait()并等待visibility_of_all_elements_located()并使用下面的css selector确定行数然后迭代并找到各自的列 .

driver.get("https://www.ushl.com/view#/schedule/24/67/12/home?league=1&gametype=-1")
totalnoofrows=WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".resp >.ht-table >tbody >tr")))
for row in totalnoofrows[1:]:
    print("Date :" + row.find_element_by_xpath("./td[1]").text)
    print("Crowd :" + row.find_element_by_xpath("./td[8]").text)
    print("==============================================")

您需要导入以下库。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

控制台输出：

Date :Dec. 1
Crowd :872
==============================================
Date :Dec. 14
Crowd :816
==============================================
Date :Dec. 15
Crowd :1065
==============================================
Date :Dec. 16
Crowd :497
=============================================

计算某些 class 的实例数并通过 Selenium 获取值

counting number of instances of certain class and acquiring values via Selenium

python

selenium

xpath

web-scraping

webdriverwait