Scrapy error when following links: AttributeError: 'HtmlResponse' object has no attribute 'follow_all'

Question

我是一个 python 和 scrapy 初学者，目前正试图在 https://www.twitchmetrics.net/channels/viewership

上获得每种语言/游戏组合的排名

但是，我无法跟上 link 的步伐。我总是得到一个 'HtmlResponse' object has no attribute 'follow_all' - error.

    def parse(self, response):
    all_channels = response.xpath('//h5')
    language_page_links = response.xpath(
        '//div[@class="mb-4"][1]//a//@href').getall()

    for i, channel in enumerate(all_channels, start=1):
        il = ItemLoader(item=LeaderboardItem(), selector=channel)
        il.add_xpath('channel_id', './text()')
        il.add_value('rank_mostwatched_all_all', i)
        yield il.load_item()

    yield from response.follow_all(language_page_links, self.parse)

在最后一行中，一旦 link-following 正常工作，我将使用不同的解析器。我还尝试了 scrapy 文档中的示例抓取器，我得到了完全相同的错误：

class AuthorSpider(scrapy.Spider):
name = 'author'

start_urls = ['http://quotes.toscrape.com/']

def parse(self, response):
    author_page_links = response.css('.author + a')
    yield from response.follow_all(author_page_links, self.parse_author)

    pagination_links = response.css('li.next a')
    yield from response.follow_all(pagination_links, self.parse)

def parse_author(self, response):
    def extract_with_css(query):
        return response.css(query).get(default='').strip()

    yield {
        'name': extract_with_css('h3.author-title::text'),
        'birthdate': extract_with_css('.author-born-date::text'),
        'bio': extract_with_css('.author-description::text'),
    }

我在这里错过了什么？

Answer 1

文档显示 follow_all 是仅在 2.0 版中可用的新方法。

您可能需要更新 scrapy

 pip install --update scrapy

Scrapy error when following links: AttributeError: 'HtmlResponse' object has no attribute 'follow_all'

Scrapy error when following links: AttributeError: 'HtmlResponse' object has no attribute 'follow_all'

python

web-crawler

scrapy

web-scraping