即使页面存在也获取状态代码 404

Question

我已经尝试使用 java 和 python 抓取这个特定的 link，但我一直收到 404 状态代码，即使它存在。

import requests
from bs4 import BeautifulSoup
from lxml import html
from collections import defaultdict

url = 'https://www.slacker.com/station/pop-remix'

def main():
    page = requests.get(url)
    print(page.status_code)
    print()

if __name__ == "__main__": main()

Answer 1

这似乎是服务器上的一些错误，但这不符合我们通常看到 404（即根本找不到页面）时的想法。当运行你的代码时，我也收到了 404 响应。但是当我打电话时：

page.content

之后，它确实转储了页面的内容。所以我在浏览器中转到 link，然后打开开发者工具。在控制台中，我可以看到以下错误：

Failed to load resource: the server responded with a status of 404 (Not Found)

即使页面在浏览器中呈现...我的猜测是请求未完全满足（页面的一部分无法加载）因此服务器决定发出 404 状态，尽管能够为您提供大量其他数据。

即使页面存在也获取状态代码 404

Getting status code 404 even though page exists

html

python

beautifulsoup

http-status-code-404

python-requests