如何使用 BeautifulSoup 在所选块内获取那个 href link

How to get that one href link inside the selected block with BeautifulSoup

我正在尝试使用 BeautifulSoup (Python 3.7) select 块内的特定 link。我怎样才能 select 在 selected 块中的特定 link?

这是我目前正在做的工作,我以前用过selenium,但我认为还没有必要。

 from bs4 import BeautifulSoup
 import requests

 base_url = 'http://www.shop.pr'

 shop_urls = {'econo' : '/econo/shoppers' , 
              'pueblo' : '/pueblo/shoppers' , 
              'costco' : '/costco/shoppers' , 
              'econo' : '/econo/shoppers'}

 selected_shop = 'econo'
 append_to_url = shop_urls.get(selected_shop)

 url = base_url + append_to_url

 page = requests.get(url)

 soup = BeautifulSoup(page.text , 'html.parser')

 toString = str(soup.prettify)

 file = open('page.txt','w+')
 file.write(toString)

 wrapper = soup.find("div", {"class": "wrapper"})
 sub_wrapper = wrapper.find('div' , {'class' : 'breadcrumb-holder' })

 print(sub_wrapper)

深入研究代码后,我得到了这个:

<div class="breadcrumb-holder">
<div data-react-class="SliderPageLink" data-react-

props='{"baseLink":"/econo/shoppers/donde-mejor-se-compra-20190711/4878/product-list-view","page":1,"linkText":"VER PRODUCTOS","sliderSelector":"#shopper-terminal .catalog-view .slider","show":true,"back":false}'></div>
<ul class="breadcrumb">
<li>
<a href="/">Shoppers</a>
</li>
<li>
<a href="/econo/shoppers?clientid=1"><strong>Econo</strong>
</a></li>
</ul>
</div>

后来尝试得到: "/econo/shoppers/donde-mejor-se-compra-20190711/4878/product-list-view" 但它 returns 我 'None'.

您尝试获取的

data-react-props 似乎是有效的 python 字典。如果是这样,我建议使用 ast.literal_eval 将其转换为字典,然后获取您想要的任何内容。

import ast 
# Your code here
drp = wrapper.find('div' , {'data-react-class': 'SliderPageLink'})['data-react-props']
drp_dict = ast.literal_eval(drp.replace(':true', ':True').replace(':false', ':False'))
base_link = drp_dict['baseLink'] # Your link here

使用 ast.literal_eval 似乎是安全的,正如其文档所述

Help on function literal_eval in module ast:

literal_eval(node_or_string)
    Safely evaluate an expression node or a string containing a Python
    expression.  The string or node provided may only consist of the following
    Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
    and None.

但是,可能需要对字符串进行一些更改,例如true 不是 python 表达式。

如果我理解正确你在寻找什么,这应该有效:

首先,

import json

然后,将以下代码添加到代码的 wrapper 部分:

target = sub_wrapper.find('div')
td = json.loads(target['data-react-props'])
print(td['baseLink'])

输出:

'/econo/shoppers/donde-mejor-se-compra-20190711/4878/product-list-view'