Python 使用 Xpath 提取特定数据的脚本
Python script to extract specific data with Xpath
我想在此 url 页面提取名为“Nb B”的行的所有数据:https://www.coteur.com/cotes-foot.php
这是我的 python 脚本:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get('https://www.coteur.com/cotes-foot.php')
#Store url associated with the soccer games
url_links = []
for i in driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]'):
url_links.append(i.get_attribute('href'))
print(len(url_links), '\n')
nb_bookies = []
for i in driver.find_elements_by_xpath('//td[contains(@class, " odds")][contains(@style, "")]'):
nb_bookies.append(i.text)
print(nb_bookies)
这是输出:
25
['1.80', '3.55', '4.70', '95%', '', '1.40', '4.60', '8.00', '94.33%', '', '2.35', '3.42', '2.63', '90.18%', '', '3.20', '3.60', '2.05', '92.19%', '', '7.00', '4.80', '1.35', '90.81%', '', '5.30', '4.30', '1.70', '99.05%', '', '2.15', '3.55', '3.65', '97.92%', '', '2.90', '3.20', '2.20', '88.81%', '', '3.95', '3.40', '2.10', '97.65%', '', '2.00', '3.80', '3.90', '98.04%', '', '2.40', '3.05', '3.50', '96.98%', '', '3.70', '3.20', '2.00', '91.72%', '', '2.75', '2.52', '3.05', '91.17%', '', '4.20', '3.05', '1.69', '84.23%', '', '1.22', '5.10', '10.00', '88.42%', '', '1.54', '4.60', '5.10', '93.72%', '', '3.00', '3.10', '2.45', '93.59%', '', '2.40', '3.50', '2.55', '90.55%', '', '1.76', '3.50', '4.20', '90.8%', '', '11.50', '5.30', '1.36', '98.91%', '', '3.00', '3.50', '2.20', '92.64%', '', '1.72', '3.42', '5.00', '92.62%', '', '1.08', '9.25', '19.00', '91.33%', '', '9.75', '5.75', '1.36', '98.82%', '', '5.70', '4.50', '1.63', '98.88%', '']
table 的所有数据都已提取,您可以在最后一行看到“”,而我只想要最后一行。
您的代码完全没问题,问题是 window 大小与 Automator 在 headless
模式下 spawned
的大小有关。在所有平台上,headless 模式下的默认 window 大小和显示大小为 800x600
。
网站的开发人员已将 header
设置为仅在 window 的宽度为 >1030px
时出现,并且只有在 display: none;
才会从 DOM
。您可以通过缩小和扩大 window 大小来自己测试。
You need to understand that if an element's attribute contains style="display: none;"
which means the element is hidden then Selenium won't be able to interact with the element, i.e. if a user can't see it then the same behavior applies to selenium
.
只需添加此行以在无头模式下放大您的 window 即可解决您的问题。
options.add_argument("window-size=1400,800")
要仅从最后一列获取数据,请相应地修复您的 XPath:
nb_bookies = []
for i in driver.find_elements_by_xpath('//tr[@id and @role="row" ]/td[last()]'):
nb_bookies.append(i.text)
输出:
['12', '12', '1', '9', '11', '12', '12', '12', '12', '12', '11', '2', '11', '11', '9', '12', '11', '12', '12', '12', '12', '12', '10', '5', '12']
我想在此 url 页面提取名为“Nb B”的行的所有数据:https://www.coteur.com/cotes-foot.php
这是我的 python 脚本:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get('https://www.coteur.com/cotes-foot.php')
#Store url associated with the soccer games
url_links = []
for i in driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]'):
url_links.append(i.get_attribute('href'))
print(len(url_links), '\n')
nb_bookies = []
for i in driver.find_elements_by_xpath('//td[contains(@class, " odds")][contains(@style, "")]'):
nb_bookies.append(i.text)
print(nb_bookies)
这是输出:
25
['1.80', '3.55', '4.70', '95%', '', '1.40', '4.60', '8.00', '94.33%', '', '2.35', '3.42', '2.63', '90.18%', '', '3.20', '3.60', '2.05', '92.19%', '', '7.00', '4.80', '1.35', '90.81%', '', '5.30', '4.30', '1.70', '99.05%', '', '2.15', '3.55', '3.65', '97.92%', '', '2.90', '3.20', '2.20', '88.81%', '', '3.95', '3.40', '2.10', '97.65%', '', '2.00', '3.80', '3.90', '98.04%', '', '2.40', '3.05', '3.50', '96.98%', '', '3.70', '3.20', '2.00', '91.72%', '', '2.75', '2.52', '3.05', '91.17%', '', '4.20', '3.05', '1.69', '84.23%', '', '1.22', '5.10', '10.00', '88.42%', '', '1.54', '4.60', '5.10', '93.72%', '', '3.00', '3.10', '2.45', '93.59%', '', '2.40', '3.50', '2.55', '90.55%', '', '1.76', '3.50', '4.20', '90.8%', '', '11.50', '5.30', '1.36', '98.91%', '', '3.00', '3.50', '2.20', '92.64%', '', '1.72', '3.42', '5.00', '92.62%', '', '1.08', '9.25', '19.00', '91.33%', '', '9.75', '5.75', '1.36', '98.82%', '', '5.70', '4.50', '1.63', '98.88%', '']
table 的所有数据都已提取,您可以在最后一行看到“”,而我只想要最后一行。
您的代码完全没问题,问题是 window 大小与 Automator 在 headless
模式下 spawned
的大小有关。在所有平台上,headless 模式下的默认 window 大小和显示大小为 800x600
。
网站的开发人员已将 header
设置为仅在 window 的宽度为 >1030px
时出现,并且只有在 display: none;
才会从 DOM
。您可以通过缩小和扩大 window 大小来自己测试。
You need to understand that if an element's attribute contains
style="display: none;"
which means the element is hidden then Selenium won't be able to interact with the element, i.e. if a user can't see it then the same behavior applies toselenium
.
只需添加此行以在无头模式下放大您的 window 即可解决您的问题。
options.add_argument("window-size=1400,800")
要仅从最后一列获取数据,请相应地修复您的 XPath:
nb_bookies = []
for i in driver.find_elements_by_xpath('//tr[@id and @role="row" ]/td[last()]'):
nb_bookies.append(i.text)
输出:
['12', '12', '1', '9', '11', '12', '12', '12', '12', '12', '11', '2', '11', '11', '9', '12', '11', '12', '12', '12', '12', '12', '10', '5', '12']