BeautifulSoup 在网页上找不到表格
BeautifulSoup doesn't find tables on webpage
我正在尝试从网站上的第一个 table 获取数据。我在这里寻找类似的问题并尝试了一些给定的解决方案,但似乎无法找到 table 并最终找不到 table.
中的数据
我试过了:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('C:\folder\chromedriver.exe')
url = 'https://docs.microsoft.com/en-us/windows/release-information/'
driver.get(url)
tbla = driver.find_element_by_name('table') #attempt using by element name
tblb = driver.find_element_by_class_name('cells-centered') #attempt using by class name
tblc = driver.find_element_by_xpath('//*[@id="winrelinfo_container"]/table[1]') #attempt by using xpath
并尝试使用美汤
html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
table = soup.find("table", {"class": "cells-centered"})
print(len(table))
非常感谢任何帮助。
Table 出现在 iframe
中,您需要先切换 iframe
才能访问 table
。
诱导 WebDriverWait()
并等待 frame_to_be_available_and_switch_to_it
() 和后续定位器。
诱导 WebDriverWait()
并等待 visibility_of_element_located
() 和后续定位器。
driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.cells-centered")))
您需要导入以下库。
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
或者您将下面的代码与 xpath
一起使用。
driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]')))
您可以将 table 数据进一步导入到 pandas 数据框,然后导出到 csv file.You 需要导入 pandas.
driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]'))).get_attribute('outerHTML')
df=pd.read_html(str(table))[0]
print(df)
df.to_csv("path/to/csv")
导入pandas:pip install pandas
然后添加下面的库
import pandas as pd
table 位于 <iframe>
内,因此 BeautifulSoup
在原始页面内看不到它:
import requests
from bs4 import BeautifulSoup
url = 'https://docs.microsoft.com/en-us/windows/release-information/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
soup = BeautifulSoup(requests.get(soup.select_one('iframe')['src']).content, 'html.parser')
for row in soup.select('table tr'):
print(row.get_text(strip=True, separator='\t'))
打印:
Version Servicing option Availability date OS build Latest revision date End of service: Home, Pro, Pro Education, Pro for Workstations and IoT Core End of service: Enterprise, Education and IoT Enterprise
2004 Semi-Annual Channel 2020-05-27 19041.546 2020-10-01 2021-12-14 2021-12-14 Microsoft recommends
1909 Semi-Annual Channel 2019-11-12 18363.1110 2020-09-16 2021-05-11 2022-05-10
1903 Semi-Annual Channel 2019-05-21 18362.1110 2020-09-16 2020-12-08 2020-12-08
1809 Semi-Annual Channel 2019-03-28 17763.1490 2020-09-16 2020-11-10 2021-05-11
1809 Semi-Annual Channel (Targeted) 2018-11-13 17763.1490 2020-09-16 2020-11-10 2021-05-11
1803 Semi-Annual Channel 2018-07-10 17134.1726 2020-09-08 End of service 2021-05-11
...and so on.
我正在尝试从网站上的第一个 table 获取数据。我在这里寻找类似的问题并尝试了一些给定的解决方案,但似乎无法找到 table 并最终找不到 table.
中的数据我试过了:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('C:\folder\chromedriver.exe')
url = 'https://docs.microsoft.com/en-us/windows/release-information/'
driver.get(url)
tbla = driver.find_element_by_name('table') #attempt using by element name
tblb = driver.find_element_by_class_name('cells-centered') #attempt using by class name
tblc = driver.find_element_by_xpath('//*[@id="winrelinfo_container"]/table[1]') #attempt by using xpath
并尝试使用美汤
html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
table = soup.find("table", {"class": "cells-centered"})
print(len(table))
非常感谢任何帮助。
Table 出现在 iframe
中,您需要先切换 iframe
才能访问 table
。
诱导 WebDriverWait()
并等待 frame_to_be_available_and_switch_to_it
() 和后续定位器。
诱导 WebDriverWait()
并等待 visibility_of_element_located
() 和后续定位器。
driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.cells-centered")))
您需要导入以下库。
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
或者您将下面的代码与 xpath
一起使用。
driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]')))
您可以将 table 数据进一步导入到 pandas 数据框,然后导出到 csv file.You 需要导入 pandas.
driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]'))).get_attribute('outerHTML')
df=pd.read_html(str(table))[0]
print(df)
df.to_csv("path/to/csv")
导入pandas:pip install pandas
然后添加下面的库
import pandas as pd
table 位于 <iframe>
内,因此 BeautifulSoup
在原始页面内看不到它:
import requests
from bs4 import BeautifulSoup
url = 'https://docs.microsoft.com/en-us/windows/release-information/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
soup = BeautifulSoup(requests.get(soup.select_one('iframe')['src']).content, 'html.parser')
for row in soup.select('table tr'):
print(row.get_text(strip=True, separator='\t'))
打印:
Version Servicing option Availability date OS build Latest revision date End of service: Home, Pro, Pro Education, Pro for Workstations and IoT Core End of service: Enterprise, Education and IoT Enterprise
2004 Semi-Annual Channel 2020-05-27 19041.546 2020-10-01 2021-12-14 2021-12-14 Microsoft recommends
1909 Semi-Annual Channel 2019-11-12 18363.1110 2020-09-16 2021-05-11 2022-05-10
1903 Semi-Annual Channel 2019-05-21 18362.1110 2020-09-16 2020-12-08 2020-12-08
1809 Semi-Annual Channel 2019-03-28 17763.1490 2020-09-16 2020-11-10 2021-05-11
1809 Semi-Annual Channel (Targeted) 2018-11-13 17763.1490 2020-09-16 2020-11-10 2021-05-11
1803 Semi-Annual Channel 2018-07-10 17134.1726 2020-09-08 End of service 2021-05-11
...and so on.