使用 scrapy.Spider 抓取单个页面有效，但不适用于使用 CrawlSpider 的整个网站

Question

这里需要一些帮助。当我通过 (scrapy.Spider) 抓取一页时，我的代码正在运行。然而，一旦我切换到 (CrawlSpider) 来抓取整个网站，它似乎根本不起作用。

    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor


class QuotesSpider(CrawlSpider):
    name = "quotes"
    allowed_domains = ['reifen.check24.de']
    start_urls = [
        'https://reifen.check24.de/pkw-sommerreifen/toyo-proxes-cf2-205-55r16-91h-2276003?label=ppc',
        'https://reifen.check24.de/pkw-sommerreifen/michelin-pilot-sport-4-205-55zr16-91w-213777?label=pc'
    ]

    rules = (
        Rule(LinkExtractor(deny= ('cart')), callback='parse_item', follow=True),
    )

    def parse(self, response):
        for quote in response.xpath('/html/body/div[2]/div/section/div/div/div[1]'):
            yield {
                'brand': quote.xpath('//tbody//tr[1]//td[2]//text()').get(),
                'pattern': quote.xpath('//tbody//tr[3]//td[2]//text()').get(),
                'size': quote.xpath('//tbody//tr[6]//td[2]//text()').get(),
                'RR': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[1]/span/text()').get(),
                'WL': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[2]/span/text()').get(),
                'noise': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[3]/span/text()').get(),

            }

我是不是漏掉了什么？

Answer 1

你犯了一个小错误：

 rules = (
        Rule(LinkExtractor(deny= ('cart')), callback='parse_item', follow=True),
    )

应该是：

 rules = (
        Rule(LinkExtractor(deny= ('cart')), callback='parse', follow=True),
    )

使用 scrapy.Spider 抓取单个页面有效，但不适用于使用 CrawlSpider 的整个网站

Crawling single pages with scrapy.Spider works but not for entire website with CrawlSpider

scrapy