Scrapy shell responce.css 发送空输出

Scrapy shell responce.css send empty output

我想通过scrapy抓取产品数据。这是产品 link :https://www.ingco.com/products/103803

我使用此代码检查响应

In [2]: response.css('div.d-flex::text').get()

In [3]: response.css('div.d-flex::text').extract()
Out[3]: []

In [4]: response.css('div.d-flex::text').extract
Out[4]: <bound method SelectorList.getall of []>

In [5]: response.css('div.d-flex::text').extract()
Out[5]: []

In [6]: response.css('div.d-flex::text').extract();

In [7]: response.css('div.d-flex').extract();

截图

但它什么也没提供。请检查我做错了什么

如果您查看页面的实际 html 源代码(Ctrl+U 在大多数浏览器中),您会发现它不包含您要抓取的信息。
产品详细信息由 javascript 从 api url (https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803).

加载

数据是json格式,api似乎是公开的,所以你的工作应该很简单。

使用此 url https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803 提取数据。通过 json api.

加载数据
In [3]: url ="https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803"

In [4]: r = scrapy.Request(url)

In [5]: fetch(r)
2021-01-11 13:42:14 [scrapy.core.engine] INFO: Spider opened
2021-01-11 13:42:15 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://webcenterapi.ingco.com/website-product/product-info-detail?id=103803> (referer: None)

In [6]: import json

In [7]: jsonresponse = json.loads(response.text)

In [8]: jsonresponse['data']
Out[8]: 
{'id': 103803,
 'productNo': 'HPWR14008',
 'productName': 'High pressure washer',
 'keyData1': '220-240V~50/60Hz',
 'keyData2': 'Pure copper wire brush motor',
 'keyData3': 'Input power:1400W',
 'parameter': 'Voltage: 220-240V~50/60Hz<br>Carbon brush motor<br>Pure copper wire<br>Input power:1400W<br>Max Pressure:130Bar (1900PSI)<br>Flow rate:5.5L/min<br>Auto stop system<br>1 set water spray gun (AMSG028 )<br>5m high pressure hose( AHPH5028)<br>Packed by color box',
 'isIndustry': 1,
 'categoryId': 11,
 'categoryName': 'Garden tools',
 'video': [{'video': 'https://www.ingco.com/userfiles/32959185488b4b11936e318b589f1edc/flash/video/20181210/HPWR14008.mp4',
   'videoType': 1}],
 'picture': ['https://www.ingco.com/userfiles/1/images/photo/20200730/HPWR14008.jpg'],
 'relevant': [],
 'annex': []}

In [9]: jsonresponse['data']['productNo']
Out[9]: 'HPWR14008'