BeautifulSoup 有时只输出数据？

Question

所以我将 link 抓取到 this subreddit 上的所有帖子（特别是过去 24 小时的热门帖子。）但是当我运行我的程序时，它有时会输出所有数据，而其他时候什么都不输出。完全相同的代码。它工作大约 1/5 的时间。

# URL of subreddit
test = requests.get('https://www.reddit.com/r/TikTokCringe/top/')
# the html of the request
html = test.text
# making a soup of the html
soup = BeautifulSoup(html, 'html.parser')
# the find_all is finding the first 30 a elements that have a href that starts with '/r/TikTokCringe/comments'
for href in soup.find_all('a', {"href": re.compile('/r/TikTokCringe/comments/*')})[:30]:
    # im looping through every element because I eventually want to get just the links
    # for now im just trying to print every element
    print(href)

Answer 1

您收到 HTTP 错误 429 - 请求过多。尝试放慢速度或设置 User-Agent HTTP header:

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0"
}

# URL of subreddit
test = requests.get("https://reddit.com/r/TikTokCringe/top/", headers=headers)

...

另外：考虑使用他们的 JSON 格式（在 URL 末尾添加 .json）：

data = requests.get(
    "https://reddit.com/r/TikTokCringe/top/.json", headers=headers
).json()

print(data)

BeautifulSoup 有时只输出数据？

BeautifulSoup only outputs data sometimes?

python

beautifulsoup