scrapy 如何从混合资源中获取收益
scrapy how to yield from miltiable sources
几天前我问了这个:
而且我学会了如何将值从网站 1 传递到网站 2。这让我从两个站点获得收益信息,当我有 10 个不同的站点时,这无法解决。
我可以继续在函数之间传递值,但这看起来很愚蠢。更有效的方法是将信息接收到解析函数并从那里产生它。
这是我想要实现的伪代码。
import scrapy
class GotoSpider(scrapy.Spider):
name = 'goto'
allowed_domains = ['first.com', 'second.com', 'third.com']
start_urls = ['http://first.com/']
def parse(self, response):
name = response.xpath(...)
price1 = scrapy.Request(second.com, callback = self.parse_check)
price2 = scrapy.Request(third.com, callback = self.parse_check2)
yield(name, price1, price2)
def parse_check(self, response):
price = response.xpath(...)
return price
def parse_check(self, response):
price = response.xpath(...)
return price
查看 scrapy-inline-requests
,它可能就是您要找的。你的例子会变成这样:
import scrapy
from inline_requests import inline_requests
class GotoSpider(scrapy.Spider):
name = 'goto'
allowed_domains = ['first.com', 'second.com', 'third.com']
start_urls = ['http://first.com/']
@inline_requests
def parse(self, response):
name = response.xpath(...)
response1 = yield scrapy.Request(second.com)
price1 = response1.xpath(...)
response2 = yield scrapy.Request(third.com)
price2 = response2.xpath(...)
yield dict(name, price1, price2)
几天前我问了这个:
而且我学会了如何将值从网站 1 传递到网站 2。这让我从两个站点获得收益信息,当我有 10 个不同的站点时,这无法解决。
我可以继续在函数之间传递值,但这看起来很愚蠢。更有效的方法是将信息接收到解析函数并从那里产生它。 这是我想要实现的伪代码。
import scrapy
class GotoSpider(scrapy.Spider):
name = 'goto'
allowed_domains = ['first.com', 'second.com', 'third.com']
start_urls = ['http://first.com/']
def parse(self, response):
name = response.xpath(...)
price1 = scrapy.Request(second.com, callback = self.parse_check)
price2 = scrapy.Request(third.com, callback = self.parse_check2)
yield(name, price1, price2)
def parse_check(self, response):
price = response.xpath(...)
return price
def parse_check(self, response):
price = response.xpath(...)
return price
查看 scrapy-inline-requests
,它可能就是您要找的。你的例子会变成这样:
import scrapy
from inline_requests import inline_requests
class GotoSpider(scrapy.Spider):
name = 'goto'
allowed_domains = ['first.com', 'second.com', 'third.com']
start_urls = ['http://first.com/']
@inline_requests
def parse(self, response):
name = response.xpath(...)
response1 = yield scrapy.Request(second.com)
price1 = response1.xpath(...)
response2 = yield scrapy.Request(third.com)
price2 = response2.xpath(...)
yield dict(name, price1, price2)