如何在 scrapy 中发出请求之前更改请求 url？

Question

我需要在下载响应之前修改我的请求 url。但我无法改变它。即使在使用 request.replace(url=new_url) 修改请求 url 之后，process_response 也会打印未修改的 url。这是中间件的代码：

def process_request(self, request, spider):
    original_url = request.url
    new_url= original_url + "hello%20world"
    print request.url            # This prints the original request url
    request=request.replace(url=new_url)
    print request.url            # This prints the modified url

def process_response(self, request, response, spider):
    print request.url            # This prints the original request url
    print response.url           # This prints the original request url
    return response

任何人都可以告诉我我在这里缺少什么吗？

Answer 1

由于您正在修改 process_request() 中的 request 对象 - 您需要 return it:

def process_request(self, request, spider): 
    # avoid infinite loop by not processing the URL if it contains the desired part
    if "hello%20world" in request.url: pass 

    new_url = request.url + "hello%20world"
    request = request.replace(url=new_url) 
    return request

如何在 scrapy 中发出请求之前更改请求 url？

How to change request url before making request in scrapy?

python

request

scrapy