无法使用 Selenium 检索 table body 内容

Unable to retrieve the table body contents using Selenium

试图通过将值放入注册编号来获取 table 中 body 的内容 id = mytable但是获取不到。

尝试使用 headers 之类的用户代理和 beautifulsoup 网络选项卡表单数据,但无法获取。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains

url="https://rof.mahaonline.gov.in/Search/Search"

driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)

driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

table = soup.find('table',{'id':'myTable'})
body = table.find('tbody')
print(body)

driver.close()

请帮我解决这个问题,如果用beautifulsoup表单数据解决就更好了,在此先感谢。

要获取 table 正文内容,您必须为 visibility_of_element_located() 引入 WebDriverWait,您可以使用以下任一方法 :

  • 使用CSS_SELECTOR:

    driver.get("https://rof.mahaonline.gov.in/Search/Search")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#registrationnumber"))).send_keys("MU000000001")
    driver.find_element_by_css_selector("button#btnSearch").click()
    WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#myTable tbody>tr td")))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#myTable tbody>tr"))).get_attribute("outerHTML"))
    
  • 使用XPATH:

    driver.get("https://rof.mahaonline.gov.in/Search/Search")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='registrationnumber']"))).send_keys("MU000000001")
    driver.find_element_by_xpath("//button[@id='btnSearch']").click()
    WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='myTable']//tbody/tr//td")))
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='myTable']//tbody/tr"))).get_attribute("outerHTML"))
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • 控制台输出:

    <tr role="row" class="odd"><td>1</td><td>MU000000001</td><td>13 June      2012</td><td>CLASSIC DEVELOPERS.</td><td>PURCHASE  AND SALE OF LANDS, PLOTS, BUILDINGS AND ALL TYPE OF CIVIL WORK WITH OR WITHOUT MATERIAL, CONTRACTORS, BUILDERS, DEVELOPERS AND REDEVELOPERS OF RESIDENTIAL PREMIES, COMMERCIAL PREMISES, SHOPPING MALLS, INDUSTRIAL SHEDS ETC.</td><td>House/Building No.BELIRAM INDUSTRIAL ESTATE.,House/Building Name:25,,<br>StreetName:S.V.ROAD,,<br>Village/Town/City:DAHISAR (EAST),<br>Taluka:Mumbai(Suburban)<br>District:Mumbai Suburban,State:Maharashtra<br>Pincode:400068<br></td></tr>
    

提供一些休眠时间来加载页面,然后 page_source。

import time
from bs4 import BeautifulSoup
from selenium import webdriver

url="https://rof.mahaonline.gov.in/Search/Search"

driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)

driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'html.parser')

table = soup.find('table',{'id':'myTable'})
body = table.find('tbody')
for row in body.find_all('tr'):
  tds=[td.text for td in row.find_all('td')]
  print(tds)

输出:

['1', 'MU000000001', '13 June      2012', 'CLASSIC DEVELOPERS.', 'PURCHASE  AND SALE OF LANDS, PLOTS, BUILDINGS AND ALL TYPE OF CIVIL WORK WITH OR WITHOUT MATERIAL, CONTRACTORS, BUILDERS, DEVELOPERS AND REDEVELOPERS OF RESIDENTIAL PREMIES, COMMERCIAL PREMISES, SHOPPING MALLS, INDUSTRIAL SHEDS ETC.', 'House/Building No.BELIRAM INDUSTRIAL ESTATE.,House/Building Name:25,,StreetName:S.V.ROAD,,Village/Town/City:DAHISAR (EAST),Taluka:Mumbai(Suburban)District:Mumbai Suburban,State:MaharashtraPincode:400068']

或者您可以使用 pandas 库,使用 read_html() 从 table.

获取数据
import time
from selenium import webdriver
import pandas as pd

url="https://rof.mahaonline.gov.in/Search/Search"
driver = webdriver.Chrome(r'C:\chromedriver.exe')
driver.get(url)

driver.find_element_by_xpath("""//*[@id="registrationnumber"]""").send_keys("MU000000001")
driver.find_element_by_xpath("""//*[@id="btnSearch"]""").click()
time.sleep(3)
table=pd.read_html(driver.page_source)
print(table[0])