将 `host` 添加到 headers[python 请求后超过 30 次重定向]

Exceeded 30 redirects after add `host` to headers[python requests]

在 headers 中获取带有 host 的 url 会抛出异常 Exceeded 30 redirects
太奇怪了,我想不通。
下面是测试代码:

>>> url = 'http://bbs.duchang8.com/forum-29-1.html'
>>> r = requests.get(url)
>>> print r.status_code
200
>>> headers = {
...     'Host': 'bbs.duchang8.com',
... }
>>> r = requests.get(url, headers=headers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 594, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 114, in resolve_redirects
    raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

简答:

不要覆盖 Host: header。

或者,用客户端重定向到的主机覆盖它。

长答案

通过显式设置 Host header,您告诉 requests 所有 后续请求中使用 header,包括由于服务器的重定向响应而重新发出的任何请求。

在这种情况下,requests 客户端被重定向到由不同服务器托管的位置 http://www.duchang8.com/forum-29-1.htmlwww.duchang8.com 对比 bbs.duchang8.com。虽然两个主机名都解析为相同的 IP 地址,但远程 HTTP 服务器对它们的处理方式不同。

最终结果是 requests 继续使用您提供的 Host: header,而不是服务器返回的正确值。然后由于 URL/server 主机和 Host: header.

之间的不匹配,随后对新位置的请求被拒绝(通过重定向)
>>> import requests
>>> url = 'http://bbs.duchang8.com/forum-29-1.html'
>>> r = requests.get(url)
>>> r
<Response [200]>
>>> r.history
[<Response [301]>]
>>> r.history[0].headers
{'content-length': '178', 'server': 'nginx', 'connection': 'keep-alive', 'location': 'http://www.duchang8.com/forum-29-1.html', 'date': 'Mon, 03 Aug 2015 12:20:31 GMT', 'content-type': 'text/html'}

这里我们看到客户端被 HTTP 301 响应重定向到 http://www.duchang8.com/forum-29-1.htmllocation: header.

使用 curl 你可以看到如果你在获取新位置时尝试提供不同的 Host: header 会发生什么:

$ curl -v -L -H 'Host: bbs.duchang8.com' http://www.duchang8.com/forum-29-1.html
*   Trying 61.160.249.39...
* Connected to www.duchang8.com (61.160.249.39) port 80 (#0)
> GET /forum-29-1.html HTTP/1.1
> User-Agent: curl/7.40.0
> Accept: */*
> Host: bbs.duchang8.com
> 
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Mon, 03 Aug 2015 12:27:33 GMT
< Content-Type: text/html
< Content-Length: 178
< Connection: keep-alive
< Location: http://www.duchang8.com/forum-29-1.html
< 
* Ignoring the response-body
* Connection #0 to host www.duchang8.com left intact
* Issue another request to this URL: 'http://www.duchang8.com/forum-29-1.html'
* Found bundle for host www.duchang8.com: 0x21b54c0
* Re-using existing connection! (#0) with host www.duchang8.com
* Connected to www.duchang8.com (61.160.249.39) port 80 (#0)
> GET /forum-29-1.html HTTP/1.1
> User-Agent: curl/7.40.0
> Accept: */*
> Host: bbs.duchang8.com
> 
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Mon, 03 Aug 2015 12:27:33 GMT
< Content-Type: text/html
< Content-Length: 178
< Connection: keep-alive
< Location: http://www.duchang8.com/forum-29-1.html
<
# and so so, and so on....

它以重定向循环结束。 requests 出现相同的请求和响应序列,最终将决定永远不会结束并中止请求。