在 Scrapy 中,在一个蜘蛛中有多个解析方法是否正确?
Is it correct in Scrapy to have multiple parse methods in one spider?
在Scrapy中一个蜘蛛有多个解析方法是否正确?
看起来像:
import scrapy
class FooSpider(scrapy.Spider):
name = 'foo'
start_urls = ['https://example.com']
def parse(self, response):
...
yield {'foo': foo}
def parse(self, response):
...
yield {'bar': bar}
不,但是您可以创建不同的方法并从 start_requests
调用它们。
import scrapy
class FooSpider(scrapy.Spider):
name = 'POC'
start_urls = ['https://scrapingclub.com/exercise/detail_basic/']
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url=url, callback=self.get_title, dont_filter=True)
yield scrapy.Request(url=url, callback=self.get_price, dont_filter=True)
def get_title(self, response):
yield {'title': response.xpath('//h3/text()').get()}
def get_price(self, response):
yield {'price': response.xpath('//div[@class="card-body"]/h4/text()').get()}
在Scrapy中一个蜘蛛有多个解析方法是否正确?
看起来像:
import scrapy
class FooSpider(scrapy.Spider):
name = 'foo'
start_urls = ['https://example.com']
def parse(self, response):
...
yield {'foo': foo}
def parse(self, response):
...
yield {'bar': bar}
不,但是您可以创建不同的方法并从 start_requests
调用它们。
import scrapy
class FooSpider(scrapy.Spider):
name = 'POC'
start_urls = ['https://scrapingclub.com/exercise/detail_basic/']
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url=url, callback=self.get_title, dont_filter=True)
yield scrapy.Request(url=url, callback=self.get_price, dont_filter=True)
def get_title(self, response):
yield {'title': response.xpath('//h3/text()').get()}
def get_price(self, response):
yield {'price': response.xpath('//div[@class="card-body"]/h4/text()').get()}