在网络抓取上调整汤的选择。Python/BeautifulSoup
Conditioning the soup selection on a web scrape.Python/BeautifulSoup
我有以下产品列表中的一个项目的代码:
<div class="nice_product_item">
<div class="npi_name">
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
<div class="price_block_list">
<span class="old_price"> 999,00 Lei </span>
<span class="price_discount">-12%</span>
<span class="cheaper_by">mai ieftin cu 120,00 lei</span>
<span class="real_price">879,00 Lei</span>
<span class="evo-credit">evoCREDIT</span></div>
</div>
</div>
一些产品获得了 price_discount 跨度,而其他产品则没有
<span class="price_discount">-12%</span>
我使用以下代码来抓取产品名称:
texts = []
for a in soup.select("div.npi_name a[href]"):
if a.span:
text = a.span.next_sibling
else:
text = a.string
texts.append(text.strip())
不知道需要什么条件才能拿到打折的商品名称
注意:它必须适用于列表
一种处理数据的方法可以是 select 所有有折扣的商品:
soup.select('div.nice_product_item:has(.price_discount):has(a[href])')
遍历 ResultSet
,选择您需要的信息并以结构化的方式存储它,例如字典列表,以便稍后处理它,例如DataFrame
并保存到 csv,json,...
例子
from bs4 import BeautifulSoup
import pandas as pd
html = '''
<div class="nice_product_item">
<div class="npi_name">
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
<div class="price_block_list">
<span class="old_price"> 999,00 Lei </span>
<span class="price_discount">-12%</span>
<span class="cheaper_by">mai ieftin cu 120,00 lei</span>
<span class="real_price">879,00 Lei</span>
<span class="evo-credit">evoCREDIT</span></div>
</div>
</div>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.select('div.nice_product_item:has(.price_discount):has(a[href])'):
data.append({
'url' : e.a['href'],
'label' :s[-1] if (s := list(e.a.stripped_strings)) else None,
'price' : s.text if (s := e.select_one('span.real_price')) else None,
'discount' : s.text if (s := e.select_one('span.price_discount')) else None,
'other' : 'edit for elements you need'
})
pd.DataFrame(data)
输出
url
label
price
discount
other
/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru)
879,00 Lei
-12%
edit for elements you need
我有以下产品列表中的一个项目的代码:
<div class="nice_product_item">
<div class="npi_name">
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
<div class="price_block_list">
<span class="old_price"> 999,00 Lei </span>
<span class="price_discount">-12%</span>
<span class="cheaper_by">mai ieftin cu 120,00 lei</span>
<span class="real_price">879,00 Lei</span>
<span class="evo-credit">evoCREDIT</span></div>
</div>
</div>
一些产品获得了 price_discount 跨度,而其他产品则没有
<span class="price_discount">-12%</span>
我使用以下代码来抓取产品名称:
texts = []
for a in soup.select("div.npi_name a[href]"):
if a.span:
text = a.span.next_sibling
else:
text = a.string
texts.append(text.strip())
不知道需要什么条件才能拿到打折的商品名称
注意:它必须适用于列表
一种处理数据的方法可以是 select 所有有折扣的商品:
soup.select('div.nice_product_item:has(.price_discount):has(a[href])')
遍历 ResultSet
,选择您需要的信息并以结构化的方式存储它,例如字典列表,以便稍后处理它,例如DataFrame
并保存到 csv,json,...
例子
from bs4 import BeautifulSoup
import pandas as pd
html = '''
<div class="nice_product_item">
<div class="npi_name">
<h2>
<a href="/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html">
<span style="color:red">Stoc limitat!</span>
Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru)
</a>
</h2>
</div>
<div class="price_block_list">
<span class="old_price"> 999,00 Lei </span>
<span class="price_discount">-12%</span>
<span class="cheaper_by">mai ieftin cu 120,00 lei</span>
<span class="real_price">879,00 Lei</span>
<span class="evo-credit">evoCREDIT</span></div>
</div>
</div>
'''
soup = BeautifulSoup(html)
data = []
for e in soup.select('div.nice_product_item:has(.price_discount):has(a[href])'):
data.append({
'url' : e.a['href'],
'label' :s[-1] if (s := list(e.a.stripped_strings)) else None,
'price' : s.text if (s := e.select_one('span.real_price')) else None,
'discount' : s.text if (s := e.select_one('span.price_discount')) else None,
'other' : 'edit for elements you need'
})
pd.DataFrame(data)
输出
url | label | price | discount | other |
---|---|---|---|---|
/solutii-mobile-telefoane-mobile/apple-telefon-mobil-apple-iphone-13-super-retina-xdr-oled-6.1-256gb-flash-camera-duala-12-12-mp-wi-fi-5g-ios-negru-3824456.html | Telefon Mobil Apple iPhone 13, Super Retina XDR OLED 6.1", 256GB Flash, Camera Duala 12 + 12 MP, Wi-Fi, 5G, iOS (Negru) | 879,00 Lei | -12% | edit for elements you need |