为什么我的 scrapy 脚本只抓取第一页而不抓取其他页面？

Question

我正在尝试抓取网站上的一些信息：http://quotes.toscrape.com/

但是我找不到抓取所有页面的方法，脚本只抓取了第一页，我不明白我做错了什么。

到目前为止，这是我的脚本：

import scrapy

from ..items import QuotetutorialItem

class QuoteSpider(scrapy.Spider):
    name = 'quotes'
    page_number = 2
    start_urls = ['http://quotes.toscrape.com/page/1/']

    def parse(self, response):

        items = QuotetutorialItem()

        all_div_quotes = response.css('div.quote')

        for quotes in all_div_quotes:   

            title = quotes.css('span.text::text').extract()
            author = quotes.css('.author::text').extract()
            tags = quotes.css('.tag::text').extract()

            items['title'] = title
            items['author'] = author
            items['tags'] = tags

            yield items

        next_page = 'http://quotes.toscrape.com/page/'+ str(QuoteSpider.page_number) + '/'


        if QuoteSpider.page_number < 11:
            QuoteSpider.page_number += 1
            yield response.follow(next_page, callback = self.parse)

然后我在终端中输入 scrapy crawl quote，它只给我第一页上的信息。

有什么想法吗？

谢谢你？

Answer 1

我觉得你的代码没问题。它提取了10页的所有信息。请加

items['url'] = response.url

在你的解析函数中。再检查是否提取10页信息。

为什么我的 scrapy 脚本只抓取第一页而不抓取其他页面？

Why my scrapy script just scrape the first page and not the others?

python

scrapy

web-scraping

python-3.x