无法使用 Selenium 检索 table body 内容
Unable to retrieve the table body contents using Selenium
试图通过将值放入注册编号来获取 table 中 body 的内容 id = mytable但是获取不到。
尝试使用 headers 之类的用户代理和 beautifulsoup 网络选项卡表单数据,但无法获取。
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
url="https://rof.mahaonline.gov.in/Search/Search"
driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)
driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
table = soup.find('table',{'id':'myTable'})
body = table.find('tbody')
print(body)
driver.close()
请帮我解决这个问题,如果用beautifulsoup表单数据解决就更好了,在此先感谢。
要获取 table 正文内容,您必须为 visibility_of_element_located()
引入 WebDriverWait,您可以使用以下任一方法 :
使用CSS_SELECTOR
:
driver.get("https://rof.mahaonline.gov.in/Search/Search")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#registrationnumber"))).send_keys("MU000000001")
driver.find_element_by_css_selector("button#btnSearch").click()
WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#myTable tbody>tr td")))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#myTable tbody>tr"))).get_attribute("outerHTML"))
使用XPATH
:
driver.get("https://rof.mahaonline.gov.in/Search/Search")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='registrationnumber']"))).send_keys("MU000000001")
driver.find_element_by_xpath("//button[@id='btnSearch']").click()
WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='myTable']//tbody/tr//td")))
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='myTable']//tbody/tr"))).get_attribute("outerHTML"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
控制台输出:
<tr role="row" class="odd"><td>1</td><td>MU000000001</td><td>13 June 2012</td><td>CLASSIC DEVELOPERS.</td><td>PURCHASE AND SALE OF LANDS, PLOTS, BUILDINGS AND ALL TYPE OF CIVIL WORK WITH OR WITHOUT MATERIAL, CONTRACTORS, BUILDERS, DEVELOPERS AND REDEVELOPERS OF RESIDENTIAL PREMIES, COMMERCIAL PREMISES, SHOPPING MALLS, INDUSTRIAL SHEDS ETC.</td><td>House/Building No.BELIRAM INDUSTRIAL ESTATE.,House/Building Name:25,,<br>StreetName:S.V.ROAD,,<br>Village/Town/City:DAHISAR (EAST),<br>Taluka:Mumbai(Suburban)<br>District:Mumbai Suburban,State:Maharashtra<br>Pincode:400068<br></td></tr>
提供一些休眠时间来加载页面,然后 page_source。
import time
from bs4 import BeautifulSoup
from selenium import webdriver
url="https://rof.mahaonline.gov.in/Search/Search"
driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)
driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')
table = soup.find('table',{'id':'myTable'})
body = table.find('tbody')
for row in body.find_all('tr'):
tds=[td.text for td in row.find_all('td')]
print(tds)
输出:
['1', 'MU000000001', '13 June 2012', 'CLASSIC DEVELOPERS.', 'PURCHASE AND SALE OF LANDS, PLOTS, BUILDINGS AND ALL TYPE OF CIVIL WORK WITH OR WITHOUT MATERIAL, CONTRACTORS, BUILDERS, DEVELOPERS AND REDEVELOPERS OF RESIDENTIAL PREMIES, COMMERCIAL PREMISES, SHOPPING MALLS, INDUSTRIAL SHEDS ETC.', 'House/Building No.BELIRAM INDUSTRIAL ESTATE.,House/Building Name:25,,StreetName:S.V.ROAD,,Village/Town/City:DAHISAR (EAST),Taluka:Mumbai(Suburban)District:Mumbai Suburban,State:MaharashtraPincode:400068']
或者您可以使用 pandas 库,使用 read_html
() 从 table.
获取数据
import time
from selenium import webdriver
import pandas as pd
url="https://rof.mahaonline.gov.in/Search/Search"
driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)
driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
time.sleep(3)
table=pd.read_html(driver.page_source)
print(table[0])
试图通过将值放入注册编号来获取 table 中 body 的内容 id = mytable但是获取不到。
尝试使用 headers 之类的用户代理和 beautifulsoup 网络选项卡表单数据,但无法获取。
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
url="https://rof.mahaonline.gov.in/Search/Search"
driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)
driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
table = soup.find('table',{'id':'myTable'})
body = table.find('tbody')
print(body)
driver.close()
请帮我解决这个问题,如果用beautifulsoup表单数据解决就更好了,在此先感谢。
要获取 table 正文内容,您必须为 visibility_of_element_located()
引入 WebDriverWait,您可以使用以下任一方法
使用
CSS_SELECTOR
:driver.get("https://rof.mahaonline.gov.in/Search/Search") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#registrationnumber"))).send_keys("MU000000001") driver.find_element_by_css_selector("button#btnSearch").click() WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#myTable tbody>tr td"))) print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#myTable tbody>tr"))).get_attribute("outerHTML"))
使用
XPATH
:driver.get("https://rof.mahaonline.gov.in/Search/Search") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='registrationnumber']"))).send_keys("MU000000001") driver.find_element_by_xpath("//button[@id='btnSearch']").click() WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='myTable']//tbody/tr//td"))) print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='myTable']//tbody/tr"))).get_attribute("outerHTML"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
控制台输出:
<tr role="row" class="odd"><td>1</td><td>MU000000001</td><td>13 June 2012</td><td>CLASSIC DEVELOPERS.</td><td>PURCHASE AND SALE OF LANDS, PLOTS, BUILDINGS AND ALL TYPE OF CIVIL WORK WITH OR WITHOUT MATERIAL, CONTRACTORS, BUILDERS, DEVELOPERS AND REDEVELOPERS OF RESIDENTIAL PREMIES, COMMERCIAL PREMISES, SHOPPING MALLS, INDUSTRIAL SHEDS ETC.</td><td>House/Building No.BELIRAM INDUSTRIAL ESTATE.,House/Building Name:25,,<br>StreetName:S.V.ROAD,,<br>Village/Town/City:DAHISAR (EAST),<br>Taluka:Mumbai(Suburban)<br>District:Mumbai Suburban,State:Maharashtra<br>Pincode:400068<br></td></tr>
提供一些休眠时间来加载页面,然后 page_source。
import time
from bs4 import BeautifulSoup
from selenium import webdriver
url="https://rof.mahaonline.gov.in/Search/Search"
driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)
driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')
table = soup.find('table',{'id':'myTable'})
body = table.find('tbody')
for row in body.find_all('tr'):
tds=[td.text for td in row.find_all('td')]
print(tds)
输出:
['1', 'MU000000001', '13 June 2012', 'CLASSIC DEVELOPERS.', 'PURCHASE AND SALE OF LANDS, PLOTS, BUILDINGS AND ALL TYPE OF CIVIL WORK WITH OR WITHOUT MATERIAL, CONTRACTORS, BUILDERS, DEVELOPERS AND REDEVELOPERS OF RESIDENTIAL PREMIES, COMMERCIAL PREMISES, SHOPPING MALLS, INDUSTRIAL SHEDS ETC.', 'House/Building No.BELIRAM INDUSTRIAL ESTATE.,House/Building Name:25,,StreetName:S.V.ROAD,,Village/Town/City:DAHISAR (EAST),Taluka:Mumbai(Suburban)District:Mumbai Suburban,State:MaharashtraPincode:400068']
或者您可以使用 pandas 库,使用 read_html
() 从 table.
import time
from selenium import webdriver
import pandas as pd
url="https://rof.mahaonline.gov.in/Search/Search"
driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)
driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
time.sleep(3)
table=pd.read_html(driver.page_source)
print(table[0])