将 `host` 添加到 headers[python 请求后超过 30 次重定向]
Exceeded 30 redirects after add `host` to headers[python requests]
在 headers 中获取带有 host
的 url 会抛出异常 Exceeded 30 redirects
。
太奇怪了,我想不通。
下面是测试代码:
>>> url = 'http://bbs.duchang8.com/forum-29-1.html'
>>> r = requests.get(url)
>>> print r.status_code
200
>>> headers = {
... 'Host': 'bbs.duchang8.com',
... }
>>> r = requests.get(url, headers=headers)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 594, in send
history = [resp for resp in gen] if allow_redirects else []
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 114, in resolve_redirects
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
简答:
不要覆盖 Host:
header。
或者,用客户端重定向到的主机覆盖它。
长答案
通过显式设置 Host
header,您告诉 requests
在 所有 后续请求中使用 header,包括由于服务器的重定向响应而重新发出的任何请求。
在这种情况下,requests
客户端被重定向到由不同服务器托管的位置 http://www.duchang8.com/forum-29-1.html
; www.duchang8.com
对比 bbs.duchang8.com
。虽然两个主机名都解析为相同的 IP 地址,但远程 HTTP 服务器对它们的处理方式不同。
最终结果是 requests
继续使用您提供的 Host:
header,而不是服务器返回的正确值。然后由于 URL/server 主机和 Host:
header.
之间的不匹配,随后对新位置的请求被拒绝(通过重定向)
>>> import requests
>>> url = 'http://bbs.duchang8.com/forum-29-1.html'
>>> r = requests.get(url)
>>> r
<Response [200]>
>>> r.history
[<Response [301]>]
>>> r.history[0].headers
{'content-length': '178', 'server': 'nginx', 'connection': 'keep-alive', 'location': 'http://www.duchang8.com/forum-29-1.html', 'date': 'Mon, 03 Aug 2015 12:20:31 GMT', 'content-type': 'text/html'}
这里我们看到客户端被 HTTP 301 响应重定向到 http://www.duchang8.com/forum-29-1.html
和 location:
header.
使用 curl
你可以看到如果你在获取新位置时尝试提供不同的 Host:
header 会发生什么:
$ curl -v -L -H 'Host: bbs.duchang8.com' http://www.duchang8.com/forum-29-1.html
* Trying 61.160.249.39...
* Connected to www.duchang8.com (61.160.249.39) port 80 (#0)
> GET /forum-29-1.html HTTP/1.1
> User-Agent: curl/7.40.0
> Accept: */*
> Host: bbs.duchang8.com
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Mon, 03 Aug 2015 12:27:33 GMT
< Content-Type: text/html
< Content-Length: 178
< Connection: keep-alive
< Location: http://www.duchang8.com/forum-29-1.html
<
* Ignoring the response-body
* Connection #0 to host www.duchang8.com left intact
* Issue another request to this URL: 'http://www.duchang8.com/forum-29-1.html'
* Found bundle for host www.duchang8.com: 0x21b54c0
* Re-using existing connection! (#0) with host www.duchang8.com
* Connected to www.duchang8.com (61.160.249.39) port 80 (#0)
> GET /forum-29-1.html HTTP/1.1
> User-Agent: curl/7.40.0
> Accept: */*
> Host: bbs.duchang8.com
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Mon, 03 Aug 2015 12:27:33 GMT
< Content-Type: text/html
< Content-Length: 178
< Connection: keep-alive
< Location: http://www.duchang8.com/forum-29-1.html
<
# and so so, and so on....
它以重定向循环结束。 requests
出现相同的请求和响应序列,最终将决定永远不会结束并中止请求。
在 headers 中获取带有 host
的 url 会抛出异常 Exceeded 30 redirects
。
太奇怪了,我想不通。
下面是测试代码:
>>> url = 'http://bbs.duchang8.com/forum-29-1.html'
>>> r = requests.get(url)
>>> print r.status_code
200
>>> headers = {
... 'Host': 'bbs.duchang8.com',
... }
>>> r = requests.get(url, headers=headers)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 594, in send
history = [resp for resp in gen] if allow_redirects else []
File "/data/www/article_fetcher/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 114, in resolve_redirects
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
简答:
不要覆盖 Host:
header。
或者,用客户端重定向到的主机覆盖它。
长答案
通过显式设置 Host
header,您告诉 requests
在 所有 后续请求中使用 header,包括由于服务器的重定向响应而重新发出的任何请求。
在这种情况下,requests
客户端被重定向到由不同服务器托管的位置 http://www.duchang8.com/forum-29-1.html
; www.duchang8.com
对比 bbs.duchang8.com
。虽然两个主机名都解析为相同的 IP 地址,但远程 HTTP 服务器对它们的处理方式不同。
最终结果是 requests
继续使用您提供的 Host:
header,而不是服务器返回的正确值。然后由于 URL/server 主机和 Host:
header.
>>> import requests
>>> url = 'http://bbs.duchang8.com/forum-29-1.html'
>>> r = requests.get(url)
>>> r
<Response [200]>
>>> r.history
[<Response [301]>]
>>> r.history[0].headers
{'content-length': '178', 'server': 'nginx', 'connection': 'keep-alive', 'location': 'http://www.duchang8.com/forum-29-1.html', 'date': 'Mon, 03 Aug 2015 12:20:31 GMT', 'content-type': 'text/html'}
这里我们看到客户端被 HTTP 301 响应重定向到 http://www.duchang8.com/forum-29-1.html
和 location:
header.
使用 curl
你可以看到如果你在获取新位置时尝试提供不同的 Host:
header 会发生什么:
$ curl -v -L -H 'Host: bbs.duchang8.com' http://www.duchang8.com/forum-29-1.html
* Trying 61.160.249.39...
* Connected to www.duchang8.com (61.160.249.39) port 80 (#0)
> GET /forum-29-1.html HTTP/1.1
> User-Agent: curl/7.40.0
> Accept: */*
> Host: bbs.duchang8.com
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Mon, 03 Aug 2015 12:27:33 GMT
< Content-Type: text/html
< Content-Length: 178
< Connection: keep-alive
< Location: http://www.duchang8.com/forum-29-1.html
<
* Ignoring the response-body
* Connection #0 to host www.duchang8.com left intact
* Issue another request to this URL: 'http://www.duchang8.com/forum-29-1.html'
* Found bundle for host www.duchang8.com: 0x21b54c0
* Re-using existing connection! (#0) with host www.duchang8.com
* Connected to www.duchang8.com (61.160.249.39) port 80 (#0)
> GET /forum-29-1.html HTTP/1.1
> User-Agent: curl/7.40.0
> Accept: */*
> Host: bbs.duchang8.com
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Mon, 03 Aug 2015 12:27:33 GMT
< Content-Type: text/html
< Content-Length: 178
< Connection: keep-alive
< Location: http://www.duchang8.com/forum-29-1.html
<
# and so so, and so on....
它以重定向循环结束。 requests
出现相同的请求和响应序列,最终将决定永远不会结束并中止请求。