BeautifulSoup 找不到元素
BeautifulSoup not finding elements
我正在尝试编写一个程序来提取以下网站的价格。我正在使用 selenium 下载网站,然后尝试使用 beautifulsoup 或 selenium 本身对其进行解析。
我确定我想要的信息始终是 class="totalPrice",我想将它们全部提取出来,最好是作为一个列表。
<td class="totalPrice" colspan="3">
Total: £560
<span class="sr_room_reinforcement"></span>
</td>
出于某种原因,以下查询从未找到任何总价。对于我做错的任何建议,我们将不胜感激。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as bs
url='http://www.booking.com/searchresults.en-gb.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs;sid=1a43e0952558ac0ad0061d5b6523a7bc;dcid=1;checkin_monthday=4;checkin_year_month=2016-2;checkout_monthday=11;checkout_year_month=2016-2;city=-2601889;class_interval=1;csflt=%7B%7D;group_adults=7;group_children=0;highlighted_hotels=1192837;hp_sbox=1;label_click=undef;no_rooms=1;review_score_group=empty;room1=A%2CA%2CA%2CA%2CA%2CA%2CA;sb_price_type=total;score_min=0;si=ai%2Cco%2Cci%2Cre%2Cdi;ss=London;ssafas=1;ssb=empty;ssne=London;ssne_untouched=London&;order=price_for_two'
driver = webdriver.PhantomJS(r"C:\Program Files (x86)\phantomjs-2.0.0-windows\bin\phantomjs.exe")
#driver = webdriver.Firefox()
driver.get(url)
# for elm in driver.find_element_by_class_name("totalPrice"):
# print(elm.text)
content = driver.page_source
soup = bs(content, 'lxml')
for e in soup.find_all('totalPrice'):
print(e.name)
driver.close()
首先,您需要等待加载总价格。使用具有 precense_of_element_located
预期条件的 WebDriverWait
class。
我还发现您需要 通过 Desired Capabilities 覆盖浏览器的 User-Agent
来假装自己不是 PhantomJS
。
完整的工作代码:
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
url = 'http://www.booking.com/searchresults.en-gb.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs;sid=1a43e0952558ac0ad0061d5b6523a7bc;dcid=1;checkin_monthday=4;checkin_year_month=2016-2;checkout_monthday=11;checkout_year_month=2016-2;city=-2601889;class_interval=1;csflt=%7B%7D;group_adults=7;group_children=0;highlighted_hotels=1192837;hp_sbox=1;label_click=undef;no_rooms=1;review_score_group=empty;room1=A%2CA%2CA%2CA%2CA%2CA%2CA;sb_price_type=total;score_min=0;si=ai%2Cco%2Cci%2Cre%2Cdi;ss=London;ssafas=1;ssb=empty;ssne=London;ssne_untouched=London&;order=price_for_two'
# setting a custom User-Agent
user_agent = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) " +
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"
)
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.get(url)
# wait for the total prices to become present
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".totalPrice")))
content = driver.page_source
driver.close()
soup = bs(content, 'lxml')
for e in soup.select('.totalPrice'):
print(e.text.strip())
它打印:
Total: US1
Total: US4
Total: US1
Total: US4
Total: US5
Total: US4
Total: US5
Total: US7
Total: US,031
附带说明一下,您并不需要 BeautifulSoup
- 您可以 locate elements with selenium
- 它非常强大。您可以通过以下方式找到总价:
for price in driver.find_elements_by_css_selector(".totalPrice"):
print(price.text.strip())
我正在尝试编写一个程序来提取以下网站的价格。我正在使用 selenium 下载网站,然后尝试使用 beautifulsoup 或 selenium 本身对其进行解析。
我确定我想要的信息始终是 class="totalPrice",我想将它们全部提取出来,最好是作为一个列表。
<td class="totalPrice" colspan="3">
Total: £560
<span class="sr_room_reinforcement"></span>
</td>
出于某种原因,以下查询从未找到任何总价。对于我做错的任何建议,我们将不胜感激。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as bs
url='http://www.booking.com/searchresults.en-gb.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs;sid=1a43e0952558ac0ad0061d5b6523a7bc;dcid=1;checkin_monthday=4;checkin_year_month=2016-2;checkout_monthday=11;checkout_year_month=2016-2;city=-2601889;class_interval=1;csflt=%7B%7D;group_adults=7;group_children=0;highlighted_hotels=1192837;hp_sbox=1;label_click=undef;no_rooms=1;review_score_group=empty;room1=A%2CA%2CA%2CA%2CA%2CA%2CA;sb_price_type=total;score_min=0;si=ai%2Cco%2Cci%2Cre%2Cdi;ss=London;ssafas=1;ssb=empty;ssne=London;ssne_untouched=London&;order=price_for_two'
driver = webdriver.PhantomJS(r"C:\Program Files (x86)\phantomjs-2.0.0-windows\bin\phantomjs.exe")
#driver = webdriver.Firefox()
driver.get(url)
# for elm in driver.find_element_by_class_name("totalPrice"):
# print(elm.text)
content = driver.page_source
soup = bs(content, 'lxml')
for e in soup.find_all('totalPrice'):
print(e.name)
driver.close()
首先,您需要等待加载总价格。使用具有 precense_of_element_located
预期条件的 WebDriverWait
class。
我还发现您需要 通过 Desired Capabilities 覆盖浏览器的 User-Agent
来假装自己不是 PhantomJS
。
完整的工作代码:
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
url = 'http://www.booking.com/searchresults.en-gb.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs;sid=1a43e0952558ac0ad0061d5b6523a7bc;dcid=1;checkin_monthday=4;checkin_year_month=2016-2;checkout_monthday=11;checkout_year_month=2016-2;city=-2601889;class_interval=1;csflt=%7B%7D;group_adults=7;group_children=0;highlighted_hotels=1192837;hp_sbox=1;label_click=undef;no_rooms=1;review_score_group=empty;room1=A%2CA%2CA%2CA%2CA%2CA%2CA;sb_price_type=total;score_min=0;si=ai%2Cco%2Cci%2Cre%2Cdi;ss=London;ssafas=1;ssb=empty;ssne=London;ssne_untouched=London&;order=price_for_two'
# setting a custom User-Agent
user_agent = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) " +
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"
)
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.get(url)
# wait for the total prices to become present
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".totalPrice")))
content = driver.page_source
driver.close()
soup = bs(content, 'lxml')
for e in soup.select('.totalPrice'):
print(e.text.strip())
它打印:
Total: US1
Total: US4
Total: US1
Total: US4
Total: US5
Total: US4
Total: US5
Total: US7
Total: US,031
附带说明一下,您并不需要 BeautifulSoup
- 您可以 locate elements with selenium
- 它非常强大。您可以通过以下方式找到总价:
for price in driver.find_elements_by_css_selector(".totalPrice"):
print(price.text.strip())