Scrapy error when following links: AttributeError: 'HtmlResponse' object has no attribute 'follow_all'
Scrapy error when following links: AttributeError: 'HtmlResponse' object has no attribute 'follow_all'
我是一个 python 和 scrapy 初学者,目前正试图在 https://www.twitchmetrics.net/channels/viewership
上获得每种语言/游戏组合的排名
但是,我无法跟上 link 的步伐。我总是得到一个 'HtmlResponse' object has no attribute 'follow_all' - error.
def parse(self, response):
all_channels = response.xpath('//h5')
language_page_links = response.xpath(
'//div[@class="mb-4"][1]//a//@href').getall()
for i, channel in enumerate(all_channels, start=1):
il = ItemLoader(item=LeaderboardItem(), selector=channel)
il.add_xpath('channel_id', './text()')
il.add_value('rank_mostwatched_all_all', i)
yield il.load_item()
yield from response.follow_all(language_page_links, self.parse)
在最后一行中,一旦 link-following 正常工作,我将使用不同的解析器。我还尝试了 scrapy 文档中的示例抓取器,我得到了完全相同的错误:
class AuthorSpider(scrapy.Spider):
name = 'author'
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
author_page_links = response.css('.author + a')
yield from response.follow_all(author_page_links, self.parse_author)
pagination_links = response.css('li.next a')
yield from response.follow_all(pagination_links, self.parse)
def parse_author(self, response):
def extract_with_css(query):
return response.css(query).get(default='').strip()
yield {
'name': extract_with_css('h3.author-title::text'),
'birthdate': extract_with_css('.author-born-date::text'),
'bio': extract_with_css('.author-description::text'),
}
我在这里错过了什么?
文档显示 follow_all 是仅在 2.0 版中可用的新方法。
您可能需要更新 scrapy
pip install --update scrapy
我是一个 python 和 scrapy 初学者,目前正试图在 https://www.twitchmetrics.net/channels/viewership
上获得每种语言/游戏组合的排名但是,我无法跟上 link 的步伐。我总是得到一个 'HtmlResponse' object has no attribute 'follow_all' - error.
def parse(self, response):
all_channels = response.xpath('//h5')
language_page_links = response.xpath(
'//div[@class="mb-4"][1]//a//@href').getall()
for i, channel in enumerate(all_channels, start=1):
il = ItemLoader(item=LeaderboardItem(), selector=channel)
il.add_xpath('channel_id', './text()')
il.add_value('rank_mostwatched_all_all', i)
yield il.load_item()
yield from response.follow_all(language_page_links, self.parse)
在最后一行中,一旦 link-following 正常工作,我将使用不同的解析器。我还尝试了 scrapy 文档中的示例抓取器,我得到了完全相同的错误:
class AuthorSpider(scrapy.Spider):
name = 'author'
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
author_page_links = response.css('.author + a')
yield from response.follow_all(author_page_links, self.parse_author)
pagination_links = response.css('li.next a')
yield from response.follow_all(pagination_links, self.parse)
def parse_author(self, response):
def extract_with_css(query):
return response.css(query).get(default='').strip()
yield {
'name': extract_with_css('h3.author-title::text'),
'birthdate': extract_with_css('.author-born-date::text'),
'bio': extract_with_css('.author-description::text'),
}
我在这里错过了什么?
文档显示 follow_all 是仅在 2.0 版中可用的新方法。
您可能需要更新 scrapy
pip install --update scrapy