TypeError: set_user_agent() takes 2 positional arguments but 3 were given
TypeError: set_user_agent() takes 2 positional arguments but 3 were given
我正在学习欺骗教程header,但是在设置用户代理功能后,终端显示错误
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class BestMoviesSpider(CrawlSpider):
name = 'best_movies'
allowed_domains = ['imdb.com']
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
def start_requests(self):
yield scrapy.Request(url='https://www.imdb.com/search/title/?genres=drama&groups=top_250&sort=user_rating,desc',
headers={
'User_Agent': self.user_agent
})
rules = (
Rule(LinkExtractor(restrict_xpaths=("//h3[@class='lister-item-header']/a")), callback='parse_item',
follow=True, process_request='set_user_agent'),
Rule(LinkExtractor(restrict_xpaths="(//a[@class='lister-page-next next-page'])[2]"),
process_request='set_user_agent')
)
def set_user_agent(self, request):
request.headers['User-Agent'] = self.user_agent
return request
错误
TypeError: set_user_agent() takes 2 positional arguments but 3 were given
您在规则中使用 set_user_agent 作为 process_request 方法。文档是这样说的:
process_request is a callable (or a string, in which case a method from the spider object with that name will be used) which will be called for every Request extracted by this rule. This callable should take said request as first argument and the Response from which the request originated as second argument. It must return a Request object or None (to filter out the request). (https://docs.scrapy.org/en/latest/topics/spiders.html)
因此您需要在 set_user_agent
方法中添加响应作为第二个参数。
def set_user_agent(self, request, response):
request.headers['User-Agent'] = self.user_agent
return request
我正在学习欺骗教程header,但是在设置用户代理功能后,终端显示错误
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
class BestMoviesSpider(CrawlSpider):
name = 'best_movies'
allowed_domains = ['imdb.com']
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
def start_requests(self):
yield scrapy.Request(url='https://www.imdb.com/search/title/?genres=drama&groups=top_250&sort=user_rating,desc',
headers={
'User_Agent': self.user_agent
})
rules = (
Rule(LinkExtractor(restrict_xpaths=("//h3[@class='lister-item-header']/a")), callback='parse_item',
follow=True, process_request='set_user_agent'),
Rule(LinkExtractor(restrict_xpaths="(//a[@class='lister-page-next next-page'])[2]"),
process_request='set_user_agent')
)
def set_user_agent(self, request):
request.headers['User-Agent'] = self.user_agent
return request
错误
TypeError: set_user_agent() takes 2 positional arguments but 3 were given
您在规则中使用 set_user_agent 作为 process_request 方法。文档是这样说的:
process_request is a callable (or a string, in which case a method from the spider object with that name will be used) which will be called for every Request extracted by this rule. This callable should take said request as first argument and the Response from which the request originated as second argument. It must return a Request object or None (to filter out the request). (https://docs.scrapy.org/en/latest/topics/spiders.html)
因此您需要在 set_user_agent
方法中添加响应作为第二个参数。
def set_user_agent(self, request, response):
request.headers['User-Agent'] = self.user_agent
return request