无法将 "view details " 按钮链接抓取为页面 "https://www.bmstores.co.uk/stores?location=KA8+9BF" 的列表

Unable to scrape the "view details "button links as a list for the page "https://www.bmstores.co.uk/stores?location=KA8+9BF"

我无法抓取“查看详细信息”按钮链接作为页面“https://www.bmstores.co.uk/stores的列表? location=KA8+9BF"..我在 beautifulsoup 和 selenium 中尝试了多个 ways.In 我使用的 selenium 术语,使用 x 路径和 css 选择器 [=35= 查找元素方法] name 但什么都没有 worked.while 使用 selenium 得到了网站的弹出问题,但是它使用弹出窗口阻止程序解决了。

在各个站点搜索但得到相同的 beautifulsoup python 代码但无法完成任务。我的代码在这里---当我 运行 我得到 2 个重复错误

1.ElementNotInteractableException: 元素不可交互 2.NoSuchElementException:消息:没有这样的元素:无法定位元素

我的代码在这里--

from bs4 import BeautifulSoup
import requests
import pandas as pd
from selenium import webdriver as wd
import time
from selenium.common.exceptions import WebDriverException

local_path_of_chrome_driver = "E:\chromedriver.exe"
driver = wd.Chrome(executable_path=local_path_of_chrome_driver)
driver.maximize_window()

data_links=[]

xpaths = 

["/html/body/div[9]/div/div/div/div/ul/li[1]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[2]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[4]/div/div[2]/a[1]","/html/body/div[9]/div/div/div/div/ul/li[5]/div/div[2]/a[1]"]
for j in xpaths:
        try:
            
            driver.find_element_by_xpath(j).click()
            
            time.sleep(3)
        
            driver.switch_to_window(driver.window_handles[-1])
            data_links.append(driver.current_url)
            
            time.sleep(3)
            
            driver.back()
        except:
            pass
            
 driver.close()

有人可以帮我吗?

要从页面 https://www.bmstores.co.uk/stores?location=KA8+9BF you have to induce WebDriverWait and you can use the following 中抓取 查看详细信息 按钮链接作为列表 https://www.bmstores.co.uk/stores?location=KA8+9BF you have to induce WebDriverWait and you can use the following :

  • 代码块:

    view_details = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.LINK_TEXT, "View Details")))
    for i in view_details:
        print(i.get_attribute("href"))
    
  • 控制台输出:

    https://www.bmstores.co.uk/stores/ayr-heathfield-retail-park-90
    https://www.bmstores.co.uk/stores/prestwick-113
    https://www.bmstores.co.uk/stores/irvine-307
    https://www.bmstores.co.uk/stores/kilmarnock-310
    https://www.bmstores.co.uk/stores/stevenston-319
    https://www.bmstores.co.uk/stores/darnley-414
    https://www.bmstores.co.uk/stores/east-kilbride-304
    https://www.bmstores.co.uk/stores/paisley-linwood-423
    https://www.bmstores.co.uk/stores/linwood-hart-street-33
    https://www.bmstores.co.uk/stores/paisley-renfrew-road-428
    

您可以使用请求模块获取所有 names 及其相关 view details button link。共有24家店铺。

import requests
from urllib.parse import urljoin

base = 'https://www.bmstores.co.uk'
link = 'https://mv7e2a3yql-dsn.algolia.net/1/indexes/*/queries'

params = {
    'x-algolia-agent': 'Algolia for JavaScript (3.35.0); Browser; instantsearch.js (3.6.0); JS Helper (2.28.0)',
    'x-algolia-application-id': 'MV7E2A3YQL',
    'x-algolia-api-key': 'Mzg2ZjM2ZmVmNzhiMmVhZjhhNjQ5ZDAzNGQ5NjE2MTQ1MDQ2ZDAwODBlMjY2YjFkNWFkOTUyOTZkNTRhY2M4MmZpbHRlcnM9JTI4c3RhdHVzJTNBYXBwcm92ZWQlMjkrQU5EK3B1Ymxpc2hkYXRlKyUzQysxNjM1NTAzMzI5K0FORCslMjhleHBpcnlkYXRlKyUzRSsxNjM1NTAzMzI5K09SK2V4cGlyeWRhdGUrJTNEKy0xJTI5',
}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    s.headers['Referer'] = 'https://www.bmstores.co.uk/stores?location=KA8+9BF'
    
    page = 0
    
    while page<=3:
        payload = {"requests":[{"indexName":"prod_bmstores_stores","params":f"query=&hitsPerPage=10&page={page}&attributesToRetrieve=*&highlightPreTag=__ais-highlight__&highlightPostTag=__%2Fais-highlight__&getRankingInfo=true&aroundLatLng=55.47888%2C-4.59464&aroundRadius=50000&clickAnalytics=false&facets=%5B%22ranges%22%5D&tagFilters="}]}
        res = s.post(link,params=params,json=payload)
        for item in res.json()['results']:
            for container in item['hits']:
                store_name = container['storename']
                detail_link = urljoin(base,container['url'])
                print(store_name,detail_link)

        page+=1