如何使用 Beautiful Soup 提取这本词典
How to extract this dictionary using Beautiful Soup
我想获取字典的可变最后一个元素(粘贴在下面),它在另一个字典“offers”中,但我不知道如何提取它。
html = s.get(url=url, headers=headers, verify=False, timeout=15)
soup = BeautifulSoup(html.text, 'html.parser')
products = soup.find_all('script', {'type': "application/ld+json"})
{"@context":"http://schema.org","@type":"Product","aggregateRating":{"@type":"AggregateRating","bestRating":5,"ratingValue":"4.8","ratingCount":11,"worstRating":3,"reviewCount":5},"brand":{"@type":"Brand","name":"New Balance"},"color":"white/red/biały","image":["https://img01.ztat.net/3"],"itemCondition":"http://schema.org/NewCondition","manufacturer":"New Balance","name":"550 UNISEX - Sneakersy niskie - white/red","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"489","priceCurrency":"PLN","sku":"NE215O06U-A110001000","url":"/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html"},{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"489","priceCurrency":"PLN","sku":"NE215O06U-A110002000","url":"/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html"} (...)
如前所述,通过 BeautifulSoup
提取内容 将 string
解码为 json.loads()
:
import json
products = '{"@context":"http://schema.org","@type":"Product","aggregateRating":{"@type":"AggregateRating","bestRating":5,"ratingValue":"4.8","ratingCount":11,"worstRating":3,"reviewCount":5},"brand":{"@type":"Brand","name":"New Balance"},"color":"white/red/biały","image":["https://img01.ztat.net/3"],"itemCondition":"http://schema.org/NewCondition","manufacturer":"New Balance","name":"550 UNISEX - Sneakersy niskie - white/red","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"489","priceCurrency":"PLN","sku":"NE215O06U-A110001000","url":"/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html"},{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"489","priceCurrency":"PLN","sku":"NE215O06U-A110002000","url":"/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html"}]}'
products = json.loads(products)
获取报价中的最后一个元素(dict):
products['offers'][-1]
输出:
{'@type': 'Offer',
'availability': 'http://schema.org/OutOfStock',
'price': '489',
'priceCurrency': 'PLN',
'sku': 'NE215O06U-A110002000',
'url': '/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html'}
例子
在您的特殊情况下,您还必须先 replace('"','"')
:
from bs4 import BeautifulSoup
import requests, json
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-61acac03-6279b8a6274777eb44d81aae",
"X-Client-Data": "CJW2yQEIpLbJAQjEtskBCKmdygEIuevKAQjr8ssBCOaEzAEItoXMAQjLicwBCKyOzAEI3I7MARiOnssB" }
html = requests.get('https://www.zalando.de/new-balance-550-unisex-sneaker-low-whitered-ne215o06u-a11.html', headers=headers)
soup = BeautifulSoup(html.content, 'lxml')
jsonData = json.loads(soup.select_one('script[type="application/ld+json"]').text.replace('"','"'))
jsonData['offers'][-1]
我想获取字典的可变最后一个元素(粘贴在下面),它在另一个字典“offers”中,但我不知道如何提取它。
html = s.get(url=url, headers=headers, verify=False, timeout=15)
soup = BeautifulSoup(html.text, 'html.parser')
products = soup.find_all('script', {'type': "application/ld+json"})
{"@context":"http://schema.org","@type":"Product","aggregateRating":{"@type":"AggregateRating","bestRating":5,"ratingValue":"4.8","ratingCount":11,"worstRating":3,"reviewCount":5},"brand":{"@type":"Brand","name":"New Balance"},"color":"white/red/biały","image":["https://img01.ztat.net/3"],"itemCondition":"http://schema.org/NewCondition","manufacturer":"New Balance","name":"550 UNISEX - Sneakersy niskie - white/red","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"489","priceCurrency":"PLN","sku":"NE215O06U-A110001000","url":"/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html"},{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"489","priceCurrency":"PLN","sku":"NE215O06U-A110002000","url":"/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html"} (...)
如前所述,通过 BeautifulSoup
提取内容 将 string
解码为 json.loads()
:
import json
products = '{"@context":"http://schema.org","@type":"Product","aggregateRating":{"@type":"AggregateRating","bestRating":5,"ratingValue":"4.8","ratingCount":11,"worstRating":3,"reviewCount":5},"brand":{"@type":"Brand","name":"New Balance"},"color":"white/red/biały","image":["https://img01.ztat.net/3"],"itemCondition":"http://schema.org/NewCondition","manufacturer":"New Balance","name":"550 UNISEX - Sneakersy niskie - white/red","offers":[{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"489","priceCurrency":"PLN","sku":"NE215O06U-A110001000","url":"/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html"},{"@type":"Offer","availability":"http://schema.org/OutOfStock","price":"489","priceCurrency":"PLN","sku":"NE215O06U-A110002000","url":"/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html"}]}'
products = json.loads(products)
获取报价中的最后一个元素(dict):
products['offers'][-1]
输出:
{'@type': 'Offer',
'availability': 'http://schema.org/OutOfStock',
'price': '489',
'priceCurrency': 'PLN',
'sku': 'NE215O06U-A110002000',
'url': '/new-balance-550-unisex-sneakersy-niskie-whitered-ne215o06u-a11.html'}
例子
在您的特殊情况下,您还必须先 replace('"','"')
:
from bs4 import BeautifulSoup
import requests, json
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-61acac03-6279b8a6274777eb44d81aae",
"X-Client-Data": "CJW2yQEIpLbJAQjEtskBCKmdygEIuevKAQjr8ssBCOaEzAEItoXMAQjLicwBCKyOzAEI3I7MARiOnssB" }
html = requests.get('https://www.zalando.de/new-balance-550-unisex-sneaker-low-whitered-ne215o06u-a11.html', headers=headers)
soup = BeautifulSoup(html.content, 'lxml')
jsonData = json.loads(soup.select_one('script[type="application/ld+json"]').text.replace('"','"'))
jsonData['offers'][-1]