BeautifulSoup - 所有 href 链接似乎都没有提取

BeautifulSoup - All href links don't appear to be extracting

我正在尝试提取 class ['address'] 中的所有 href 链接。每次我 运行 代码,我只得到前 5 个,仅此而已,即使我知道应该有 9 个。

网页: https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch

我阅读了下面的各种主题,无数次修改了我的代码,包括切换所有解析器(html.parser、html5lib、lxml、xml、lxml-xml) 但似乎没有任何效果。知道是什么导致它在第 5 次迭代后停止吗?我对 python 还是很陌生,所以如果这是我忽略的菜鸟错误,我深表歉意。任何帮助将不胜感激,即使是讽刺的回答:)

我在以下网页上使用了非常相似的代码,并且在抓取 hrefs 时没有遇到任何问题: https://www.walgreens.com/storelistings/storesbystate.jsp?requestType=locator https://www.walgreens.com/storelistings/storesbycity.jsp?requestType=locator&state=AK

我的代码如下:

import requests
from bs4 import BeautifulSoup


local_rg = requests.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = local_rg.content
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

for link in local_rg_content_src.find_all('div'):
    local_class = str(link.get('class'))
    if str("['address']") in str(local_class):
        local_a = link.find_all('a')
        for a_link in local_a:
            local_href = str(a_link.get('href'))
            print(local_href)

我的结果(前 5 个):

  1. /locator/walgreens-1470+w+北极光+blvd-anchorage-ak-99503/id=15092
  2. /locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
  3. /locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
  4. /locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
  5. /locator/walgreens-2197+w+钻石+blvd-anchorage-ak-99515/id=12680

但应该是9:

  1. /locator/walgreens-1470+w+北极光+blvd-anchorage-ak-99503/id=15092
  2. /locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
  3. /locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
  4. /locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
  5. /locator/walgreens-2197+w+钻石+blvd-anchorage-ak-99515/id=12680
  6. /locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
  7. /locator/walgreens-12405+布兰登+st-anchorage-ak-99515/id=13449
  8. /locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
  9. /locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681

尝试使用 selenium 而不是 requests 来获取页面的源代码。以下是您的操作方法:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

其余代码相同。这是完整的代码:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')

local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

for link in local_rg_content_src.find_all('div'):
    local_class = str(link.get('class'))
    if str("['address']") in str(local_class):
        local_a = link.find_all('a')
        for a_link in local_a:
            local_href = str(a_link.get('href'))
            print(local_href)

输出:

/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681

该页面使用 Ajax 从外部 URL 加载商店信息。您可以使用 requests/json 模块加载它:

import re
import json
import requests


url = 'https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch'
ajax_url = 'https://www.walgreens.com/locator/v1/stores/search?requestor=search'
m = re.search(r'"lat":([\d.-]+),"lng":([\d.-]+)', requests.get(url).text)

params = {
    'lat': m.group(1),
    'lng': m.group(2)
}

data = requests.post(ajax_url, json=params).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for result in data['results']:
    print(result['store']['address']['street'])
    print('https://www.walgreens.com' + result['storeSeoUrl'])
    print('-' * 80)

打印:

1470 W NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
--------------------------------------------------------------------------------
725 E NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
--------------------------------------------------------------------------------
4353 LAKE OTIS PARKWAY
https://www.walgreens.com/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
--------------------------------------------------------------------------------
7600 DEBARR RD
https://www.walgreens.com/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
--------------------------------------------------------------------------------
2197 W DIMOND BLVD
https://www.walgreens.com/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
--------------------------------------------------------------------------------
2550 E 88TH AVE
https://www.walgreens.com/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
--------------------------------------------------------------------------------
12405 BRANDON ST
https://www.walgreens.com/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
--------------------------------------------------------------------------------
12051 OLD GLENN HWY
https://www.walgreens.com/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
--------------------------------------------------------------------------------
1721 E PARKS HWY
https://www.walgreens.com/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
--------------------------------------------------------------------------------