为什么我的代码只循环遍历第一个使用 BeautifulSoup 的网页？

Question

我只是在闲逛 BeautifulSoup 并在最近了解它后在不同的网站上对其进行测试。我目前正在尝试遍历多个页面而不仅仅是第一页。我可以附加或写入我从我想要的任何特定页面获取的信息，但当然我很乐意将其自动化。

这是我在尝试让它运行到第五页时的当前代码。目前它只遍历第一个网页并将我正在寻找的相同信息写入我的 excel 文件，五次。在我的嵌套 for 循环中，我有一些打印语句，只是为了在我查看文件之前查看它是否在控制台上工作。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import unicodecsv as csv

f = open("on_sale_games.csv", "w", encoding='utf-8')
headers = "Game name, Original price, Final price, Percent off\n"
f.write(headers)

for i in range(5):
    my_url = 'https://store.steampowered.com/specials#p={}&tab=TopSellers'.format(i+1)

    uClient = uReq(my_url)  # open up the url and download the page.
    page_html = uClient.read()  # reading the html page and storing the info into page_html.
    uClient.close()  # closing the page.

    page_soup = soup(page_html, 'html.parser')  # html parsing

    containers = page_soup.findAll("a", {"class": "tab_item"})

    for container in containers:
        name_stuff = container.findAll("div", {"class": "tab_item_name"})
        name = name_stuff[0].text
        print("Game name:", name)

        original_price = container.findAll("div", {"class": "discount_original_price"})
        original = original_price[0].text
        print("Original price:", original)

        discounted_price = container.findAll("div", {"class": "discount_final_price"})
        final = discounted_price[0].text
        print("Discounted price:", final)

        discount_pct = container.findAll("div", {"class": "discount_pct"})
        pct = discount_pct[0].text
        print("Percent off:", pct)

        f.write(name.replace(':', '').replace("™", " ") + ',' + original + ',' + final + ',' + pct + '\n')

f.close()

Answer 1

检查浏览器发出的请求，我注意到在后台发出请求以获取数据并获得 json 结果，您可以从那里开始工作：

for i in range(5):
    my_url = 'https://store.steampowered.com/contenthub/querypaginated/specials/NewReleases/render/?query=&start={}'.format(i*15)
    uClient = uReq(my_url)
    page_html = uClient.read()
    uClient.close()
    data = json.loads(page_html)["results_html"]
    page_soup = soup(data, 'html.parser')
    # Rest of the code

这就像一个 API 每页有 15 个元素，所以它从 0、15、30 开始，依此类推。

为什么我的代码只循环遍历第一个使用 BeautifulSoup 的网页？

Why is my code looping through only the first webpage using BeautifulSoup?

python

beautifulsoup

export-to-csv