如何仅针对 scrapy 蜘蛛中的特定请求更改 header?
How to change the header just for a specific request in scrapy spider?
I am trying to build a web crawler using scrapy. I want to change useragent for a single request in the spider. I tried the below code but the user agent is not being updated during the crawl process.
def start_requests(self):
request = Request(
"url",
callback=self.parse_search,
meta={'xpaths': self.xpaths},
headers={
"User-Agent": "Googlebot-Image/1.0"
}
)
return [request]
您的代码完美运行(请参阅我的代码)。但是你身边的一些middleware
可能会影响你的User-Agent
header:
class UserAgentSpider(scrapy.Spider):
name = 'useragent_spider'
user_agents = [
{'title': 'Galaxy S9', 'value': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G960F Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36'},
{'title': 'iPhone', 'value': 'Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/69.0.3497.105 Mobile/15E148 Safari/605.1'},
{'title': 'Edge', 'value': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246'},
]
def start_requests(self):
for user_agent in self.user_agents:
yield scrapy.Request(
url="https://www.myip.com/",
headers={
'user-agent': user_agent['value'],
},
cb_kwargs={
'user_agent': user_agent['title']
},
callback=self.parse,
dont_filter=True,
)
def parse(self, response, user_agent):
with open(f"Samples/{user_agent}.htm", 'wb') as f:
f.write(response.body)
I am trying to build a web crawler using scrapy. I want to change useragent for a single request in the spider. I tried the below code but the user agent is not being updated during the crawl process.
def start_requests(self):
request = Request(
"url",
callback=self.parse_search,
meta={'xpaths': self.xpaths},
headers={
"User-Agent": "Googlebot-Image/1.0"
}
)
return [request]
您的代码完美运行(请参阅我的代码)。但是你身边的一些middleware
可能会影响你的User-Agent
header:
class UserAgentSpider(scrapy.Spider):
name = 'useragent_spider'
user_agents = [
{'title': 'Galaxy S9', 'value': 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G960F Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36'},
{'title': 'iPhone', 'value': 'Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/69.0.3497.105 Mobile/15E148 Safari/605.1'},
{'title': 'Edge', 'value': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246'},
]
def start_requests(self):
for user_agent in self.user_agents:
yield scrapy.Request(
url="https://www.myip.com/",
headers={
'user-agent': user_agent['value'],
},
cb_kwargs={
'user_agent': user_agent['title']
},
callback=self.parse,
dont_filter=True,
)
def parse(self, response, user_agent):
with open(f"Samples/{user_agent}.htm", 'wb') as f:
f.write(response.body)