HTTP 404 状态代码(未找到)显示为 302
HTTP 404 status code (Not Found) shown as 302
我正在尝试使用以下代码检索 python 中 URL 列表的 HTTP 状态代码:
try:
r = requests.head(testpoint_url)
print(testpoint_url+" : "+str(r.status_code))
# prints the int of the status code.
except requests.ConnectionError:
print("failed to connect")
令人惊讶的是,对于某些 URL,我得到 302
状态代码,而如果通过浏览器浏览,您会看到它显示 404 代码!
这是怎么回事?如何获取真实的状态码(例如404)?
302 是 HTTP 重定向。 Web 浏览器将跟随重定向到 Location
响应 header 中报告的 URL。当请求下一个URL时,它会有自己的响应码,可以包括404.
您的 Python 代码不遵循重定向,这可以解释为什么它得到原来的 302。
根据 Requests 文档:
By default Requests will perform location redirection for all verbs except HEAD.
We can use the history
property of the Response object to track redirection.
The Response.history
list contains the Response
objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response.
...
If you’re using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the allow_redirects
parameter:
>>> r = requests.get('https://github.com/', allow_redirects=False)
>>> r.status_code
301
>>> r.history
[]
If you’re using HEAD, you can enable redirection as well:
>>> r = requests.head('https://github.com/', allow_redirects=True)
>>> r.url
'https://github.com/'
>>> r.history
[<Response [301]>]
因此,在您的代码中,将此更改为:
r = requests.head(testpoint_url)
为此:
r = requests.head(testpoint_url, allow_redirects=True)
然后r.status_code
将是所有重定向后的最终状态代码(即404)。
我正在尝试使用以下代码检索 python 中 URL 列表的 HTTP 状态代码:
try:
r = requests.head(testpoint_url)
print(testpoint_url+" : "+str(r.status_code))
# prints the int of the status code.
except requests.ConnectionError:
print("failed to connect")
令人惊讶的是,对于某些 URL,我得到 302
状态代码,而如果通过浏览器浏览,您会看到它显示 404 代码!
这是怎么回事?如何获取真实的状态码(例如404)?
302 是 HTTP 重定向。 Web 浏览器将跟随重定向到 Location
响应 header 中报告的 URL。当请求下一个URL时,它会有自己的响应码,可以包括404.
您的 Python 代码不遵循重定向,这可以解释为什么它得到原来的 302。
根据 Requests 文档:
By default Requests will perform location redirection for all verbs except HEAD.
We can use the
history
property of the Response object to track redirection.The
Response.history
list contains theResponse
objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response....
If you’re using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the
allow_redirects
parameter:>>> r = requests.get('https://github.com/', allow_redirects=False) >>> r.status_code 301 >>> r.history []
If you’re using HEAD, you can enable redirection as well:
>>> r = requests.head('https://github.com/', allow_redirects=True) >>> r.url 'https://github.com/' >>> r.history [<Response [301]>]
因此,在您的代码中,将此更改为:
r = requests.head(testpoint_url)
为此:
r = requests.head(testpoint_url, allow_redirects=True)
然后r.status_code
将是所有重定向后的最终状态代码(即404)。