无法将 "view details " 按钮链接抓取为页面 "https://www.bmstores.co.uk/stores?location=KA8+9BF" 的列表
Unable to scrape the "view details "button links as a list for the page "https://www.bmstores.co.uk/stores?location=KA8+9BF"
我无法抓取“查看详细信息”按钮链接作为页面“https://www.bmstores.co.uk/stores的列表? location=KA8+9BF"..我在 beautifulsoup 和 selenium 中尝试了多个 ways.In 我使用的 selenium 术语,使用 x 路径和 css 选择器 [=35= 查找元素方法] name 但什么都没有 worked.while 使用 selenium 得到了网站的弹出问题,但是它使用弹出窗口阻止程序解决了。
在各个站点搜索但得到相同的 beautifulsoup python 代码但无法完成任务。我的代码在这里---当我 运行 我得到 2 个重复错误
1.ElementNotInteractableException: 元素不可交互
2.NoSuchElementException:消息:没有这样的元素:无法定位元素
我的代码在这里--
from bs4 import BeautifulSoup
import requests
import pandas as pd
from selenium import webdriver as wd
import time
from selenium.common.exceptions import WebDriverException
local_path_of_chrome_driver = "E:\chromedriver.exe"
driver = wd.Chrome(executable_path=local_path_of_chrome_driver)
driver.maximize_window()
data_links=[]
xpaths =
["/html/body/div[9]/div/div/div/div/ul/li[1]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[2]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[4]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[5]/div/div[2]/a[1]"]
for j in xpaths:
try:
driver.find_element_by_xpath(j).click()
time.sleep(3)
driver.switch_to_window(driver.window_handles[-1])
data_links.append(driver.current_url)
time.sleep(3)
driver.back()
except:
pass
driver.close()
有人可以帮我吗?
要从页面 https://www.bmstores.co.uk/stores?location=KA8+9BF you have to induce WebDriverWait and you can use the following 中抓取 查看详细信息 按钮链接作为列表 https://www.bmstores.co.uk/stores?location=KA8+9BF you have to induce WebDriverWait and you can use the following :
代码块:
view_details = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.LINK_TEXT, "View Details")))
for i in view_details:
print(i.get_attribute("href"))
控制台输出:
https://www.bmstores.co.uk/stores/ayr-heathfield-retail-park-90
https://www.bmstores.co.uk/stores/prestwick-113
https://www.bmstores.co.uk/stores/irvine-307
https://www.bmstores.co.uk/stores/kilmarnock-310
https://www.bmstores.co.uk/stores/stevenston-319
https://www.bmstores.co.uk/stores/darnley-414
https://www.bmstores.co.uk/stores/east-kilbride-304
https://www.bmstores.co.uk/stores/paisley-linwood-423
https://www.bmstores.co.uk/stores/linwood-hart-street-33
https://www.bmstores.co.uk/stores/paisley-renfrew-road-428
您可以使用请求模块获取所有 names
及其相关 view details button link
。共有24家店铺。
import requests
from urllib.parse import urljoin
base = 'https://www.bmstores.co.uk'
link = 'https://mv7e2a3yql-dsn.algolia.net/1/indexes/*/queries'
params = {
'x-algolia-agent': 'Algolia for JavaScript (3.35.0); Browser; instantsearch.js (3.6.0); JS Helper (2.28.0)',
'x-algolia-application-id': 'MV7E2A3YQL',
'x-algolia-api-key': 'Mzg2ZjM2ZmVmNzhiMmVhZjhhNjQ5ZDAzNGQ5NjE2MTQ1MDQ2ZDAwODBlMjY2YjFkNWFkOTUyOTZkNTRhY2M4MmZpbHRlcnM9JTI4c3RhdHVzJTNBYXBwcm92ZWQlMjkrQU5EK3B1Ymxpc2hkYXRlKyUzQysxNjM1NTAzMzI5K0FORCslMjhleHBpcnlkYXRlKyUzRSsxNjM1NTAzMzI5K09SK2V4cGlyeWRhdGUrJTNEKy0xJTI5',
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
s.headers['Referer'] = 'https://www.bmstores.co.uk/stores?location=KA8+9BF'
page = 0
while page<=3:
payload = {"requests":[{"indexName":"prod_bmstores_stores","params":f"query=&hitsPerPage=10&page={page}&attributesToRetrieve=*&highlightPreTag=__ais-highlight__&highlightPostTag=__%2Fais-highlight__&getRankingInfo=true&aroundLatLng=55.47888%2C-4.59464&aroundRadius=50000&clickAnalytics=false&facets=%5B%22ranges%22%5D&tagFilters="}]}
res = s.post(link,params=params,json=payload)
for item in res.json()['results']:
for container in item['hits']:
store_name = container['storename']
detail_link = urljoin(base,container['url'])
print(store_name,detail_link)
page+=1
我无法抓取“查看详细信息”按钮链接作为页面“https://www.bmstores.co.uk/stores的列表? location=KA8+9BF"..我在 beautifulsoup 和 selenium 中尝试了多个 ways.In 我使用的 selenium 术语,使用 x 路径和 css 选择器 [=35= 查找元素方法] name 但什么都没有 worked.while 使用 selenium 得到了网站的弹出问题,但是它使用弹出窗口阻止程序解决了。
在各个站点搜索但得到相同的 beautifulsoup python 代码但无法完成任务。我的代码在这里---当我 运行 我得到 2 个重复错误
1.ElementNotInteractableException: 元素不可交互 2.NoSuchElementException:消息:没有这样的元素:无法定位元素
我的代码在这里--
from bs4 import BeautifulSoup
import requests
import pandas as pd
from selenium import webdriver as wd
import time
from selenium.common.exceptions import WebDriverException
local_path_of_chrome_driver = "E:\chromedriver.exe"
driver = wd.Chrome(executable_path=local_path_of_chrome_driver)
driver.maximize_window()
data_links=[]
xpaths =
["/html/body/div[9]/div/div/div/div/ul/li[1]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[2]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[4]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[5]/div/div[2]/a[1]"]
for j in xpaths:
try:
driver.find_element_by_xpath(j).click()
time.sleep(3)
driver.switch_to_window(driver.window_handles[-1])
data_links.append(driver.current_url)
time.sleep(3)
driver.back()
except:
pass
driver.close()
有人可以帮我吗?
要从页面 https://www.bmstores.co.uk/stores?location=KA8+9BF you have to induce WebDriverWait and you can use the following
代码块:
view_details = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.LINK_TEXT, "View Details"))) for i in view_details: print(i.get_attribute("href"))
控制台输出:
https://www.bmstores.co.uk/stores/ayr-heathfield-retail-park-90 https://www.bmstores.co.uk/stores/prestwick-113 https://www.bmstores.co.uk/stores/irvine-307 https://www.bmstores.co.uk/stores/kilmarnock-310 https://www.bmstores.co.uk/stores/stevenston-319 https://www.bmstores.co.uk/stores/darnley-414 https://www.bmstores.co.uk/stores/east-kilbride-304 https://www.bmstores.co.uk/stores/paisley-linwood-423 https://www.bmstores.co.uk/stores/linwood-hart-street-33 https://www.bmstores.co.uk/stores/paisley-renfrew-road-428
您可以使用请求模块获取所有 names
及其相关 view details button link
。共有24家店铺。
import requests
from urllib.parse import urljoin
base = 'https://www.bmstores.co.uk'
link = 'https://mv7e2a3yql-dsn.algolia.net/1/indexes/*/queries'
params = {
'x-algolia-agent': 'Algolia for JavaScript (3.35.0); Browser; instantsearch.js (3.6.0); JS Helper (2.28.0)',
'x-algolia-application-id': 'MV7E2A3YQL',
'x-algolia-api-key': 'Mzg2ZjM2ZmVmNzhiMmVhZjhhNjQ5ZDAzNGQ5NjE2MTQ1MDQ2ZDAwODBlMjY2YjFkNWFkOTUyOTZkNTRhY2M4MmZpbHRlcnM9JTI4c3RhdHVzJTNBYXBwcm92ZWQlMjkrQU5EK3B1Ymxpc2hkYXRlKyUzQysxNjM1NTAzMzI5K0FORCslMjhleHBpcnlkYXRlKyUzRSsxNjM1NTAzMzI5K09SK2V4cGlyeWRhdGUrJTNEKy0xJTI5',
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
s.headers['Referer'] = 'https://www.bmstores.co.uk/stores?location=KA8+9BF'
page = 0
while page<=3:
payload = {"requests":[{"indexName":"prod_bmstores_stores","params":f"query=&hitsPerPage=10&page={page}&attributesToRetrieve=*&highlightPreTag=__ais-highlight__&highlightPostTag=__%2Fais-highlight__&getRankingInfo=true&aroundLatLng=55.47888%2C-4.59464&aroundRadius=50000&clickAnalytics=false&facets=%5B%22ranges%22%5D&tagFilters="}]}
res = s.post(link,params=params,json=payload)
for item in res.json()['results']:
for container in item['hits']:
store_name = container['storename']
detail_link = urljoin(base,container['url'])
print(store_name,detail_link)
page+=1