HTTP 404 状态代码(未找到)显示为 302

HTTP 404 status code (Not Found) shown as 302

我正在尝试使用以下代码检索 python 中 URL 列表的 HTTP 状态代码:

try:
    r = requests.head(testpoint_url)
    print(testpoint_url+" : "+str(r.status_code))
    # prints the int of the status code.
except requests.ConnectionError:
    print("failed to connect")

令人惊讶的是,对于某些 URL,我得到 302 状态代码,而如果通过浏览器浏览,您会看到它显示 404 代码!

这是怎么回事?如何获取真实的状态码(例如404)?

302 是 HTTP 重定向。 Web 浏览器将跟随重定向到 Location 响应 header 中报告的 URL。当请求下一个URL时,它会有自己的响应码,可以包括404.

您的 Python 代码不遵循重定向,这可以解释为什么它得到原来的 302。

根据 Requests 文档:

Redirection and History

By default Requests will perform location redirection for all verbs except HEAD.

We can use the history property of the Response object to track redirection.

The Response.history list contains the Response objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response.

...

If you’re using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the allow_redirects parameter:

>>> r = requests.get('https://github.com/', allow_redirects=False)

>>> r.status_code
301

>>> r.history
[]

If you’re using HEAD, you can enable redirection as well:

>>> r = requests.head('https://github.com/', allow_redirects=True)

>>> r.url
'https://github.com/'

>>> r.history
[<Response [301]>]

因此,在您的代码中,将此更改为:

r = requests.head(testpoint_url)

为此:

r = requests.head(testpoint_url, allow_redirects=True)

然后r.status_code将是所有重定向后的最终状态代码(即404)。