selenium Web 驱动程序不 return 维基百科 table

Question

我正在尝试抓取 table，其中包含在美国举行的所有总统选举的结果。为此，我想使用硒。我相信我试图抓取的 table 是由客户端脚本 (javescript) 执行的，因此我在抓取网站之前试图注意是否存在特定标签。[注意：我试过抓取该页面直接带有漂亮的汤，但我不断收到“None”响应。

这是我的代码。

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas
#using selenium and shromedriver to extract the javascript wikipage 

scrape_options=Options()
scrape_options.add_argument('--headless')
driver=webdriver.Chrome(executable_path='web scraping master/chromedriver', options=scrape_options)
page_info=driver.get('https://en.wikipedia.org/wiki/United_States_presidential_election')


#waiting for the javascript to load


try:WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CLASS_NAME,"wikitable 
sortablejquetablesorter")))
finally:page=driver.page_source
soup=BeautifulSoup(page,'html.parser')
table=soup.find('table',{'class':'wikitable sortable jquery-tablesorter'})

这段代码没有return想要的结果，而只是return一个

TimeoutExceptionerror

不管我给它多少时间。

另请注意：当我替换行时：

try:WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CLASS_NAME,"wikitable 
sortablejquetablesorter")))

与：

try:WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CLASS_NAME,"wikitable")))

它 return 是我需要的 table，但原始 table 中只有一半数据存在。

我认为我的代码有问题，但我似乎无法理解问题所在。有人能帮我吗？卡在这里太久了

Answer 1

通过 class_name 查找元素只接受 class 名称。它不支持多个 class 名称而是使用 css selector.

try:
    WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".wikitable.sortable.jquery-tablesorter")))
    page=driver.page_source
except:
    print("No element found")

soup=BeautifulSoup(page,'html.parser')
table=soup.select_one('.wikitable.sortable.jquery-tablesorter')  #css selector for beautiful soup
df=pd.read_html(str(table))[0]
print(df)

要将数据加载到数据框中，您需要导入以下库

import pandas as pd

如果您的系统中没有安装它，请尝试使用

安装

pip install pandas

selenium Web 驱动程序不 return 维基百科 table

selenium Web Driver does not return Wikipedia table

python

selenium

web-scraping

selenium-chromedriver

webdriverwait