Selenium 找不到元素 Python
Selenium not finding elements Python
我用 selenium 编写了一段代码来提取足球联赛中的轮数,从我所见,所有页面的所有元素都是相同的,但出于某种原因,该代码适用于某些 links对其他人不起作用。
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from time import sleep
def pack_links(l):
options = Options()
options.headless = True
driver = webdriver.Chrome()
driver.get(l)
rnds = driver.find_element_by_id('showRound')
a_ = rnds.find_elements_by_xpath(".//td[@class='lsm2']")
#a_ = driver.find_elements_by_class_name('lsm2')
knt = 0
for _ in a_:
knt = knt+1
print(knt)
sleep(2)
driver.close()
return None
link = 'http://info.nowgoal.com/en/League/34.html'
pack_links(link)
这是一个 link 有效 Nowgoal Serie B,它 return 是带有 class lsm2
[= 的 td
标签的数量20=]
以及源页面的图片
而这个return的0,由于某种原因它没有找到带有class lsm2
Nowgoal Serie A的标签,还有该段的图片出于兴趣
即使我试图用这个注释行 a_ = driver.find_elements_by_class_name('lsm2')
直接找到它,它仍然是 returns 0。我将不胜感激任何帮助。
据我了解,具有 "showRound" id 的 td 的内部 HTML 是动态的,由 showRound() 加载 JS 函数,它在页面加载时由页面的 head 标记内的脚本依次调用。因此,在您的情况下,似乎没有足够的时间来加载。我尝试通过两种方式解决这个问题:
拼凑:使用driver.implicitly_wait(number_of_seconds_to_wait)。我还建议将来使用它而不是 sleep()。但是,这个解决方案非常笨拙并且有点异步;换句话说,它主要等待秒数倒计时而不是结果。
我们可能会等待 "lsm2" class 的第一个元素加载;如果在一些合理的超时后未能这样做,我们可能会停止等待并引发异常(感谢 Zeinab Abbasimazar 的回答 here)。这可以通过 expected_conditions 和 WebDriverWait:
来实现
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
def pack_links(l):
options = webdriver.ChromeOptions() # I would also suggest to use this instead of Options()
options.add_argument("--headless")
options.add_argument("--enable-javascript") # To be on the safe side, although it seems to be enabled by default
driver = webdriver.Chrome("path_to_chromedriver_binary", options=options)
driver.get(l)
rnds = driver.find_element_by_id('showRound')
"""Until now, your code has gone almost unchanged. Now let's wait for the first td element with lsm2 class to load, with setting maximum timeout of 5 seconds:"""
try:
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CLASS_NAME, "lsm2")))
print("All necessary tables have been loaded successfully")
except TimeoutException:
raise("Timeout error")
"""Then we proceed in case of success:"""
a_ = rnds.find_elements_by_xpath(".//td[@class='lsm2']")
knt = 0
for _ in a_:
knt = knt+1
print(knt)
driver.implicitly_wait(2) # Not sure if it is needed here anymore
driver.close()
driver.quit() # I would also recommend to make sure you quit the driver not only close it if you don't want to kill numerous RAM-greedy Chrome processes by hand
return None
您可以进行一些实验并调整您需要的超时长度以获得必要的结果。我还建议使用 len(a_) 而不是使用 for 循环进行迭代,但这取决于您。
我用 selenium 编写了一段代码来提取足球联赛中的轮数,从我所见,所有页面的所有元素都是相同的,但出于某种原因,该代码适用于某些 links对其他人不起作用。
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from time import sleep
def pack_links(l):
options = Options()
options.headless = True
driver = webdriver.Chrome()
driver.get(l)
rnds = driver.find_element_by_id('showRound')
a_ = rnds.find_elements_by_xpath(".//td[@class='lsm2']")
#a_ = driver.find_elements_by_class_name('lsm2')
knt = 0
for _ in a_:
knt = knt+1
print(knt)
sleep(2)
driver.close()
return None
link = 'http://info.nowgoal.com/en/League/34.html'
pack_links(link)
这是一个 link 有效 Nowgoal Serie B,它 return 是带有 class lsm2
[= 的 td
标签的数量20=]
以及源页面的图片
而这个return的0,由于某种原因它没有找到带有class lsm2
Nowgoal Serie A的标签,还有该段的图片出于兴趣
a_ = driver.find_elements_by_class_name('lsm2')
直接找到它,它仍然是 returns 0。我将不胜感激任何帮助。
据我了解,具有 "showRound" id 的 td 的内部 HTML 是动态的,由 showRound() 加载 JS 函数,它在页面加载时由页面的 head 标记内的脚本依次调用。因此,在您的情况下,似乎没有足够的时间来加载。我尝试通过两种方式解决这个问题:
拼凑:使用driver.implicitly_wait(number_of_seconds_to_wait)。我还建议将来使用它而不是 sleep()。但是,这个解决方案非常笨拙并且有点异步;换句话说,它主要等待秒数倒计时而不是结果。
我们可能会等待 "lsm2" class 的第一个元素加载;如果在一些合理的超时后未能这样做,我们可能会停止等待并引发异常(感谢 Zeinab Abbasimazar 的回答 here)。这可以通过 expected_conditions 和 WebDriverWait:
来实现
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
def pack_links(l):
options = webdriver.ChromeOptions() # I would also suggest to use this instead of Options()
options.add_argument("--headless")
options.add_argument("--enable-javascript") # To be on the safe side, although it seems to be enabled by default
driver = webdriver.Chrome("path_to_chromedriver_binary", options=options)
driver.get(l)
rnds = driver.find_element_by_id('showRound')
"""Until now, your code has gone almost unchanged. Now let's wait for the first td element with lsm2 class to load, with setting maximum timeout of 5 seconds:"""
try:
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CLASS_NAME, "lsm2")))
print("All necessary tables have been loaded successfully")
except TimeoutException:
raise("Timeout error")
"""Then we proceed in case of success:"""
a_ = rnds.find_elements_by_xpath(".//td[@class='lsm2']")
knt = 0
for _ in a_:
knt = knt+1
print(knt)
driver.implicitly_wait(2) # Not sure if it is needed here anymore
driver.close()
driver.quit() # I would also recommend to make sure you quit the driver not only close it if you don't want to kill numerous RAM-greedy Chrome processes by hand
return None
您可以进行一些实验并调整您需要的超时长度以获得必要的结果。我还建议使用 len(a_) 而不是使用 for 循环进行迭代,但这取决于您。