在网络抓取上调整汤的选择。Python/BeautifulSoup

Conditioning the soup selection on a web scrape.Python/BeautifulSoup

我有以下产品列表中的一个项目的代码:

    <div class="nice_product_item">
    <div class="npi_name">
       <h2>
           <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
           <span style="color:red">Stoc limitat!</span>  
           Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru)
            </a>        
       </h2>
    </div>

    <div class="price_block_list">
        <span class="old_price">&nbsp;999,00 Lei&nbsp;</span>
        <span class="price_discount">-12%</span>
        <span class="cheaper_by">mai ieftin cu 120,00 lei</span>
        <span class="real_price">879,00 Lei</span>
        <span class="evo-credit">evoCREDIT</span></div>
    </div>
</div>

一些产品获得了 price_discount 跨度,而其他产品则没有

<span class="price_discount">-12%</span>

我使用以下代码来抓取产品名称:

texts = []

for a in soup.select("div.npi_name a[href]"):
    if a.span:
        text = a.span.next_sibling
    else:
        text = a.string
    texts.append(text.strip())

不知道需要什么条件才能拿到打折的商品名称

注意:它必须适用于列表

一种处理数据的方法可以是 select 所有有折扣的商品:

soup.select('div.nice_product_item:has(.price_discount):has(a[href])')

遍历 ResultSet,选择您需要的信息并以结构化的方式存储它,例如字典列表,以便稍后处理它,例如DataFrame 并保存到 csv,json,...

例子

from bs4 import BeautifulSoup
import pandas as pd

html = '''
<div class="nice_product_item">
    <div class="npi_name">
       <h2>
           <a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html"> 
           <span style="color:red">Stoc limitat!</span>  
           Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru)
            </a>        
       </h2>
    </div>

    <div class="price_block_list">
        <span class="old_price">&nbsp;999,00 Lei&nbsp;</span>
        <span class="price_discount">-12%</span>
        <span class="cheaper_by">mai ieftin cu 120,00 lei</span>
        <span class="real_price">879,00 Lei</span>
        <span class="evo-credit">evoCREDIT</span></div>
    </div>
</div>
'''

soup = BeautifulSoup(html)

data = []

for e in soup.select('div.nice_product_item:has(.price_discount):has(a[href])'):
    data.append({
        'url' : e.a['href'],
        'label' :s[-1] if (s := list(e.a.stripped_strings)) else None,
        'price' : s.text if (s := e.select_one('span.real_price')) else None,
        'discount' : s.text if (s := e.select_one('span.price_discount')) else None,
        'other' : 'edit for elements you need'
    })
pd.DataFrame(data)

输出

url label price discount other
/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru) 879,00 Lei -12% edit for elements you need