如何在 scrapy 中发出请求之前更改请求 url?
How to change request url before making request in scrapy?
我需要在下载响应之前修改我的请求 url。但我无法改变它。即使在使用 request.replace(url=new_url)
修改请求 url 之后,process_response
也会打印未修改的 url。这是中间件的代码:
def process_request(self, request, spider):
original_url = request.url
new_url= original_url + "hello%20world"
print request.url # This prints the original request url
request=request.replace(url=new_url)
print request.url # This prints the modified url
def process_response(self, request, response, spider):
print request.url # This prints the original request url
print response.url # This prints the original request url
return response
任何人都可以告诉我我在这里缺少什么吗?
由于您正在修改 process_request()
中的 request
对象 - 您需要 return it:
def process_request(self, request, spider):
# avoid infinite loop by not processing the URL if it contains the desired part
if "hello%20world" in request.url: pass
new_url = request.url + "hello%20world"
request = request.replace(url=new_url)
return request
我需要在下载响应之前修改我的请求 url。但我无法改变它。即使在使用 request.replace(url=new_url)
修改请求 url 之后,process_response
也会打印未修改的 url。这是中间件的代码:
def process_request(self, request, spider):
original_url = request.url
new_url= original_url + "hello%20world"
print request.url # This prints the original request url
request=request.replace(url=new_url)
print request.url # This prints the modified url
def process_response(self, request, response, spider):
print request.url # This prints the original request url
print response.url # This prints the original request url
return response
任何人都可以告诉我我在这里缺少什么吗?
由于您正在修改 process_request()
中的 request
对象 - 您需要 return it:
def process_request(self, request, spider):
# avoid infinite loop by not processing the URL if it contains the desired part
if "hello%20world" in request.url: pass
new_url = request.url + "hello%20world"
request = request.replace(url=new_url)
return request