Scrapy 从 json 响应中提取数据
Scrapy extracting data from json response
我正在尝试使用 scrapy
从 json 响应中提取数据。目的是获取 respons:e
中列出的产品
import scrapy
import json
class DepopSpider(scrapy.Spider):
name = 'depop'
allowed_domains = ["depop.com"]
start_urls = ['https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance']
def parse(self, response):
data = json.loads(response.body)
yield from data['meta']['products']
我收到以下错误:
ERROR: Spider error processing <GET https://webapi.depop.com/api/v2/search/products/?brands=1596&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance> (referer: None)
如果你想处理 json 请求的响应,可以试试这个:
import requests
url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
所以你的输出是这样的:
{
"meta": {
"resultCount": 20,
"cursor": "MnwyMHwxNjQwMDA1ODc3",
"hasMore": false,
"totalCount": 20
},
"products": [
{
"id": 215371070,
"slug": "kicksbrothers-exclusive-genuine-blue-inc",
"status": "ONSALE",
"hasVideo": false,
"price": {
"priceAmount": "22.98",
"currencyName": "GBP",
"nationalShippingCost": "4.99",
"internationalShippingCost": "10.00"
},
"preview": {
"150": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P2.jpg",
"210": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P4.jpg",
"320": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P5.jpg",
"480": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P6.jpg",
"640": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P1.jpg",
"960": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P7.jpg",
"1280": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P8.jpg"
},
"variantSetId": 93,
"variants": {
"7": 1
},
"isLiked": false
},
如何解析json响应
import requests
import json
def get_requests():
url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
return response.text
# x uses method "get_requests"
x = get_requests()
data_json = json.loads(x)
for id, price in zip(data_json['products'], data_json['products']):
print(id['id'])
print(price['price']['priceAmount'])
输出:
215371070
22.98
256715789
8.00
202721541
5.00
202722546
5.00
274328291
24.00
221641139
10.00
245419941
30.00
192541316
8.00
147762409
14.00
158406248
9.99
234693030
20.00
213377081
10.00
228630951
10.00
203627182
16.00
159958157
7.99
151413456
27.20
250985338
8.00
185488012
15.00
154423470
20.00
193888222
10.00
您遍历了 json 响应并保存了键值:“id”和“price”
这是使用 scrapy and json
的最小工作代码
脚本:
import scrapy
import json
class DepopSpider(scrapy.Spider):
name = 'depop'
def start_requests(self):
yield scrapy.Request (
url='https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance',
method='GET',
callback = self.parse,
)
def parse(self, response):
resp= response.json()['products']
#print(resp)
# json_data = json.dumps(resp)
# with open('data.json','w') as f:
# f.write(json_data)
for item in resp:
yield {
'Name': item['slug'],
'price':item['price']['priceAmount']
}
输出:
{'Name': 'kicksbrothers-exclusive-genuine-blue-inc', 'price': '22.98'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'isabellaimogen-crew-clothing-full-length-slim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-shirt-size', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-brand-1990s', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'tommkent-high-waisted-vintage-jeans-washed', 'price': '24.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'megsharp-super-cute-flowery-anchor-blue', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'moniulka2607-sweat-wear-for-man-shorts', 'price': '30.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'quynheu-free-uk-shipping-anchor-blue-07e1', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'bradymonster-oversized-stone-washed-shirt-from', 'price': '14.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'bonebear-vintage-funky-mens-large-shirt', 'price': '9.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'katy_potaty-vintage-anchor-blue-mom-jeanstrousers', 'price': '20.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'urielbongco-washed-up-denim-jacket-preloved', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'reubz16--thick-thermal-heavy-t-shirt', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'reubz16--vintage-egypt-tourist-tee', 'price': '16.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'kristoferjohnson-blue-harbour-mens-tailored-fit', 'price': '7.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'ravsonline-blue-willis-pure-indigo-cotton', 'price': '27.20'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'shikhalamode-anchor-blue-low-rise-denim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
..等等
我正在尝试使用 scrapy
从 json 响应中提取数据。目的是获取 respons:e
import scrapy
import json
class DepopSpider(scrapy.Spider):
name = 'depop'
allowed_domains = ["depop.com"]
start_urls = ['https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance']
def parse(self, response):
data = json.loads(response.body)
yield from data['meta']['products']
我收到以下错误:
ERROR: Spider error processing <GET https://webapi.depop.com/api/v2/search/products/?brands=1596&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance> (referer: None)
如果你想处理 json 请求的响应,可以试试这个:
import requests
url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
所以你的输出是这样的:
{
"meta": {
"resultCount": 20,
"cursor": "MnwyMHwxNjQwMDA1ODc3",
"hasMore": false,
"totalCount": 20
},
"products": [
{
"id": 215371070,
"slug": "kicksbrothers-exclusive-genuine-blue-inc",
"status": "ONSALE",
"hasVideo": false,
"price": {
"priceAmount": "22.98",
"currencyName": "GBP",
"nationalShippingCost": "4.99",
"internationalShippingCost": "10.00"
},
"preview": {
"150": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P2.jpg",
"210": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P4.jpg",
"320": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P5.jpg",
"480": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P6.jpg",
"640": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P1.jpg",
"960": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P7.jpg",
"1280": "https://pictures.depop.com/b0/24241961/1015682639_ea92c00979b64a298f7b9cce465bfb5f/P8.jpg"
},
"variantSetId": 93,
"variants": {
"7": 1
},
"isLiked": false
},
如何解析json响应
import requests
import json
def get_requests():
url = "https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance"
payload={}
headers = {}
response = requests.request("GET", url, headers=headers, data=payload)
return response.text
# x uses method "get_requests"
x = get_requests()
data_json = json.loads(x)
for id, price in zip(data_json['products'], data_json['products']):
print(id['id'])
print(price['price']['priceAmount'])
输出:
215371070
22.98
256715789
8.00
202721541
5.00
202722546
5.00
274328291
24.00
221641139
10.00
245419941
30.00
192541316
8.00
147762409
14.00
158406248
9.99
234693030
20.00
213377081
10.00
228630951
10.00
203627182
16.00
159958157
7.99
151413456
27.20
250985338
8.00
185488012
15.00
154423470
20.00
193888222
10.00
您遍历了 json 响应并保存了键值:“id”和“price”
这是使用 scrapy and json
脚本:
import scrapy
import json
class DepopSpider(scrapy.Spider):
name = 'depop'
def start_requests(self):
yield scrapy.Request (
url='https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance',
method='GET',
callback = self.parse,
)
def parse(self, response):
resp= response.json()['products']
#print(resp)
# json_data = json.dumps(resp)
# with open('data.json','w') as f:
# f.write(json_data)
for item in resp:
yield {
'Name': item['slug'],
'price':item['price']['priceAmount']
}
输出:
{'Name': 'kicksbrothers-exclusive-genuine-blue-inc', 'price': '22.98'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'isabellaimogen-crew-clothing-full-length-slim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-shirt-size', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'elliewarwick97-vintage-anchor-blue-brand-1990s', 'price': '5.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'tommkent-high-waisted-vintage-jeans-washed', 'price': '24.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'megsharp-super-cute-flowery-anchor-blue', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'moniulka2607-sweat-wear-for-man-shorts', 'price': '30.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'quynheu-free-uk-shipping-anchor-blue-07e1', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'bradymonster-oversized-stone-washed-shirt-from', 'price': '14.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'bonebear-vintage-funky-mens-large-shirt', 'price': '9.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'katy_potaty-vintage-anchor-blue-mom-jeanstrousers', 'price': '20.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'urielbongco-washed-up-denim-jacket-preloved', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'reubz16--thick-thermal-heavy-t-shirt', 'price': '10.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'reubz16--vintage-egypt-tourist-tee', 'price': '16.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'kristoferjohnson-blue-harbour-mens-tailored-fit', 'price': '7.99'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'ravsonline-blue-willis-pure-indigo-cotton', 'price': '27.20'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
{'Name': 'shikhalamode-anchor-blue-low-rise-denim', 'price': '8.00'}
2021-12-20 20:37:26 [scrapy.core.scraper] DEBUG: Scraped from <200 https://webapi.depop.com/api/v2/search/products/?brands=1645&itemsPerPage=24&country=gb¤cy=GBP&sort=relevance>
..等等