scrapy 如何从混合资源中获取收益

scrapy how to yield from miltiable sources

几天前我问了这个:

而且我学会了如何将值从网站 1 传递到网站 2。这让我从两个站点获得收益信息,当我有 10 个不同的站点时,这无法解决。

我可以继续在函数之间传递值,但这看起来很愚蠢。更有效的方法是将信息接收到解析函数并从那里产生它。 这是我想要实现的伪代码。

import scrapy

class GotoSpider(scrapy.Spider):
    name = 'goto'
    allowed_domains = ['first.com', 'second.com', 'third.com']
    start_urls = ['http://first.com/']

def parse(self, response):
    name = response.xpath(...)
    price1 = scrapy.Request(second.com, callback = self.parse_check)
    price2 = scrapy.Request(third.com, callback = self.parse_check2)
    yield(name, price1, price2)


def parse_check(self, response):
    price = response.xpath(...)
    return price

def parse_check(self, response):
    price = response.xpath(...)
    return price

查看 scrapy-inline-requests,它可能就是您要找的。你的例子会变成这样:

import scrapy
from inline_requests import inline_requests

class GotoSpider(scrapy.Spider):
    name = 'goto'
    allowed_domains = ['first.com', 'second.com', 'third.com']
    start_urls = ['http://first.com/']

    @inline_requests
    def parse(self, response):
        name = response.xpath(...)

        response1 = yield scrapy.Request(second.com)
        price1 = response1.xpath(...)
        response2 = yield scrapy.Request(third.com)
        price2 = response2.xpath(...)

        yield dict(name, price1, price2)