网站上的问题定位元素(selenium webdriver)
Trouble targeting elements on website (selenium webdriver)
我正在尝试定位房地产网站上的房产。理想情况下,我想提取每个列表的 属性 营销 URL、标题、位置和电子邮件。属性全部列出如下:
<div class="propertyList">
<div id="propertyList74495-sale" class="deal_on_market propertyListItem" data-property-id="74495-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=74495-sale" data-listing-id="148815"></div>
<table>
<tbody>
<tr>
<td class="thumbnail">
<a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale"></a>
</td>
<td class="addressInfo">
<a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale">
Engelberg Antik's
</a>
<p class="propertiesListCityStateZip">
<img src="/images/map-marker-tiny.png?1427481879" alt="Map-marker-tiny"></img>
Salem, OR
</p>
<p class="description">
Outstanding downtown Salem opportunity, right next…
</p>
<div class="smallAttributes">
<div></div>
<div></div>
<div></div>
</div>
</td>
<td class="propertyInfo">
<div>
9,900
</div>
<div>
13,612 SF
</div>
<div>
Street Retail
</div>
</td>
</tr>
</tbody>
</table>
<div class="contactAdvisor">
::before
<a href="mailto:brokeremail@svn.com"></a>
or call
503.588.0400
for more information
</div>
<div class="links"></div>
<div id="propertyList61436-sale" class="deal_under_contract propertyListItem" data-property-id="61436-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=61436-sale" data-listing-id="124490"></div>
<div id="propertyList89374-sale" class="deal_on_market propertyListItem" data-property-id="89374-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=89374-sale" data-listing-id="173124"></div>
<div id="propertyList84437-sale" class="deal_on_market propertyListItem" data-property-id="84437-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84437-sale" data-listing-id="164488"></div>
<div id="propertyList84478-sale" class="deal_on_market propertyListItem" data-property-id="84478-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84478-sale" data-listing-id="164538"></div>
...
这是我第一次尝试:
from selenium import webdriver
import sys
import smtplib
import pymongo
newProperties = []
driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')
for property in driver.find_elements_by_class_name('propertyList'):
#get title,location
info = property.find_elements_by_class_name('addressInfo')
email = property.find_elements_by_partial_link_text('.com')
当我运行上面的时候,它不会给出任何driver无法定位元素的错误。但是,当我打印出元素时,什么也没有出现。我怎样才能更好地定位元素?我想要这样的东西,附加到列表中:
-title: Engelberg Antik's
-location: Salem, OR
-url: http://svncommercialadvisors.com/properties/?propertyId=74495-sale
-email: brokeremail@svn.com
这里的关键问题是 搜索结果是在 iframe
.
中加载的
您需要切换到 iframe
才能搜索属性。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')
# wait for frame to appear and switch
frame = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#buildout iframe")))
driver.switch_to.frame(frame)
for property in driver.find_elements_by_class_name('propertyList'):
info = property.find_element_by_class_name('addressInfo')
email = property.find_element_by_partial_link_text('Email')
print info.text
print print email.get_attribute('href')
我还应用了两个修复程序:
- 将
find_elements_by_class_namme
替换为 find_elements_by_class_name
- 将
property.find_elements_by_partial_link_text('.com')
替换为 property.find_element_by_partial_link_text('Email')
它打印:
Engelberg Antik's
Salem, OR
Outstanding downtown Salem opportunity, right next door to the newly renovated Roth and McGilchri...
mailto:jennifer.martin@svn.com
我正在尝试定位房地产网站上的房产。理想情况下,我想提取每个列表的 属性 营销 URL、标题、位置和电子邮件。属性全部列出如下:
<div class="propertyList">
<div id="propertyList74495-sale" class="deal_on_market propertyListItem" data-property-id="74495-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=74495-sale" data-listing-id="148815"></div>
<table>
<tbody>
<tr>
<td class="thumbnail">
<a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale"></a>
</td>
<td class="addressInfo">
<a target="_top" href="http://svncommercialadvisors.com/properties/?propertyId=74495-sale">
Engelberg Antik's
</a>
<p class="propertiesListCityStateZip">
<img src="/images/map-marker-tiny.png?1427481879" alt="Map-marker-tiny"></img>
Salem, OR
</p>
<p class="description">
Outstanding downtown Salem opportunity, right next…
</p>
<div class="smallAttributes">
<div></div>
<div></div>
<div></div>
</div>
</td>
<td class="propertyInfo">
<div>
9,900
</div>
<div>
13,612 SF
</div>
<div>
Street Retail
</div>
</td>
</tr>
</tbody>
</table>
<div class="contactAdvisor">
::before
<a href="mailto:brokeremail@svn.com"></a>
or call
503.588.0400
for more information
</div>
<div class="links"></div>
<div id="propertyList61436-sale" class="deal_under_contract propertyListItem" data-property-id="61436-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=61436-sale" data-listing-id="124490"></div>
<div id="propertyList89374-sale" class="deal_on_market propertyListItem" data-property-id="89374-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=89374-sale" data-listing-id="173124"></div>
<div id="propertyList84437-sale" class="deal_on_market propertyListItem" data-property-id="84437-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84437-sale" data-listing-id="164488"></div>
<div id="propertyList84478-sale" class="deal_on_market propertyListItem" data-property-id="84478-sale" data-listing-url="http://svncommercialadvisors.com/properties/?propertyId=84478-sale" data-listing-id="164538"></div>
...
这是我第一次尝试:
from selenium import webdriver
import sys
import smtplib
import pymongo
newProperties = []
driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')
for property in driver.find_elements_by_class_name('propertyList'):
#get title,location
info = property.find_elements_by_class_name('addressInfo')
email = property.find_elements_by_partial_link_text('.com')
当我运行上面的时候,它不会给出任何driver无法定位元素的错误。但是,当我打印出元素时,什么也没有出现。我怎样才能更好地定位元素?我想要这样的东西,附加到列表中:
-title: Engelberg Antik's
-location: Salem, OR
-url: http://svncommercialadvisors.com/properties/?propertyId=74495-sale
-email: brokeremail@svn.com
这里的关键问题是 搜索结果是在 iframe
.
您需要切换到 iframe
才能搜索属性。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('http://svncommercialadvisors.com/properties/')
# wait for frame to appear and switch
frame = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#buildout iframe")))
driver.switch_to.frame(frame)
for property in driver.find_elements_by_class_name('propertyList'):
info = property.find_element_by_class_name('addressInfo')
email = property.find_element_by_partial_link_text('Email')
print info.text
print print email.get_attribute('href')
我还应用了两个修复程序:
- 将
find_elements_by_class_namme
替换为find_elements_by_class_name
- 将
property.find_elements_by_partial_link_text('.com')
替换为property.find_element_by_partial_link_text('Email')
它打印:
Engelberg Antik's
Salem, OR
Outstanding downtown Salem opportunity, right next door to the newly renovated Roth and McGilchri...
mailto:jennifer.martin@svn.com