如何使用 scrapy 收集所有的 ancor href?

How to collect all of the ancor href using scrapy?

enter image description here

我试着在 scrapy 中找到这个 shell

>>>scrapy shell https://www.trendyol.com/trendyol-man/antrasit-basic-erkek-bisiklet-yaka-oversize-kisa-kollu-t-shirt-tmnss21ts0811-p-90831387
>>>response.css("div.slick-track").getall()

在输出中显示没有 ancor 部分的所有内容。我需要所有的图像 href。请帮我解决这个问题

Fazlul 所述,数据是动态生成的(更具体地说,仅图像和评论)。使用 chrome 开发工具,您可以轻松找到此 API https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869。现在,你可以开始了。

代码

from scrapy import Request


class Trendyol(scrapy.Spider):
    name = 'test'
    domain_name = "https://www.trendyol.com"
    def start_requests(self):
        url = "https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869"

        yield Request(url=url, callback=self.parse)

    def parse(self, response):
        json_text = json.loads(response.body)
        data = json_text.get('result').get("slicingAttributes")[0].get("attributes")
        for i in data:
            full_url = self.domain_name+i['contents'][0]['url']
            print(full_url)

在scrapy中可以通过这种方式获取图片shell。该站点正在使用 API 获取数据

>>>scrapy shell https://public.trendyol.com/discovery-web-productgw-service/api/productGroup/68379869
>>>import json
>>>raw_images = json.loads(response.text)
>>>raw_images = raw_images['result']["slicingAttributes"][0]["attributes"]
>>>["https://cdn.dsmcdn.com"+image['contents'][0]['imageUrl'] for image in raw_images]

输出:

['https://cdn.dsmcdn.com/ty62/product/media/images/20210128/20/58099823/135399582/5/5_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty129/product/media/images/20210616/9/101392400/186966992/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty63/product/media/images/20210128/20/58099823/135399574/4/4_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty106/product/media/images/20210426/18/83152826/164609399/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty62/product/media/images/20210128/20/58099823/135399570/4/4_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty63/product/media/images/20210128/20/58099823/135399586/5/5_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty106/product/media/images/20210426/18/83152826/164609404/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty76/product/media/images/20210323/17/74722131/151899173/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty62/product/media/images/20210129/19/58452645/135399594/3/3_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty62/product/media/images/20210128/20/58099823/135399598/4/4_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty48/product/media/images/20210329/20/76027592/151899177/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty107/product/media/images/20210426/18/83152826/164609413/2/2_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty69/product/media/images/20210323/17/74722131/151899169/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty64/product/media/images/20210128/20/58099823/135399578/4/4_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty62/product/media/images/20210128/20/58099823/135399590/4/4_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty105/product/media/images/20210426/18/83152826/164609408/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty85/product/media/images/20210312/17/70978132/149257621/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty131/product/media/images/20210616/9/101407912/186952893/1/1_org_zoom.jpeg',
 'https://cdn.dsmcdn.com/ty135/product/media/images/20210628/8/104826549/186953020/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty64/product/media/images/20210128/20/58099823/135399562/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty68/product/media/images/20210214/13/62120233/138664105/1/1_org_zoom.jpg',
 'https://cdn.dsmcdn.com/ty105/product/media/images/20210421/10/81841504/135399565/1/1_org_zoom.jpg']