使用 scrapy.Spider 抓取单个页面有效,但不适用于使用 CrawlSpider 的整个网站

Crawling single pages with scrapy.Spider works but not for entire website with CrawlSpider

这里需要一些帮助。当我通过 (scrapy.Spider) 抓取一页时,我的代码正在运行。然而,一旦我切换到 (CrawlSpider) 来抓取整个网站,它似乎根本不起作用。

    from scrapy.spiders import CrawlSpider, Rule
    from scrapy.linkextractors import LinkExtractor


class QuotesSpider(CrawlSpider):
    name = "quotes"
    allowed_domains = ['reifen.check24.de']
    start_urls = [
        'https://reifen.check24.de/pkw-sommerreifen/toyo-proxes-cf2-205-55r16-91h-2276003?label=ppc',
        'https://reifen.check24.de/pkw-sommerreifen/michelin-pilot-sport-4-205-55zr16-91w-213777?label=pc'
    ]

    rules = (
        Rule(LinkExtractor(deny= ('cart')), callback='parse_item', follow=True),
    )

    def parse(self, response):
        for quote in response.xpath('/html/body/div[2]/div/section/div/div/div[1]'):
            yield {
                'brand': quote.xpath('//tbody//tr[1]//td[2]//text()').get(),
                'pattern': quote.xpath('//tbody//tr[3]//td[2]//text()').get(),
                'size': quote.xpath('//tbody//tr[6]//td[2]//text()').get(),
                'RR': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[1]/span/text()').get(),
                'WL': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[2]/span/text()').get(),
                'noise': quote.xpath('div[1]/div[1]/div/div[1]/div[2]/span/span/span/div/div/div[3]/span/text()').get(),

            }

我是不是漏掉了什么?

你犯了一个小错误:

 rules = (
        Rule(LinkExtractor(deny= ('cart')), callback='parse_item', follow=True),
    )

应该是:

 rules = (
        Rule(LinkExtractor(deny= ('cart')), callback='parse', follow=True),
    )