为什么我的代码只循环遍历第一个使用 BeautifulSoup 的网页?
Why is my code looping through only the first webpage using BeautifulSoup?
我只是在闲逛 BeautifulSoup 并在最近了解它后在不同的网站上对其进行测试。我目前正在尝试遍历多个页面而不仅仅是第一页。我可以附加或写入我从我想要的任何特定页面获取的信息,但当然我很乐意将其自动化。
这是我在尝试让它运行到第五页时的当前代码。目前它只遍历第一个网页并将我正在寻找的相同信息写入我的 excel 文件,五次。在我的嵌套 for 循环中,我有一些打印语句,只是为了在我查看文件之前查看它是否在控制台上工作。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import unicodecsv as csv
f = open("on_sale_games.csv", "w", encoding='utf-8')
headers = "Game name, Original price, Final price, Percent off\n"
f.write(headers)
for i in range(5):
my_url = 'https://store.steampowered.com/specials#p={}&tab=TopSellers'.format(i+1)
uClient = uReq(my_url) # open up the url and download the page.
page_html = uClient.read() # reading the html page and storing the info into page_html.
uClient.close() # closing the page.
page_soup = soup(page_html, 'html.parser') # html parsing
containers = page_soup.findAll("a", {"class": "tab_item"})
for container in containers:
name_stuff = container.findAll("div", {"class": "tab_item_name"})
name = name_stuff[0].text
print("Game name:", name)
original_price = container.findAll("div", {"class": "discount_original_price"})
original = original_price[0].text
print("Original price:", original)
discounted_price = container.findAll("div", {"class": "discount_final_price"})
final = discounted_price[0].text
print("Discounted price:", final)
discount_pct = container.findAll("div", {"class": "discount_pct"})
pct = discount_pct[0].text
print("Percent off:", pct)
f.write(name.replace(':', '').replace("™", " ") + ',' + original + ',' + final + ',' + pct + '\n')
f.close()
检查浏览器发出的请求,我注意到在后台发出请求以获取数据并获得 json
结果,您可以从那里开始工作:
for i in range(5):
my_url = 'https://store.steampowered.com/contenthub/querypaginated/specials/NewReleases/render/?query=&start={}'.format(i*15)
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
data = json.loads(page_html)["results_html"]
page_soup = soup(data, 'html.parser')
# Rest of the code
这就像一个 API 每页有 15 个元素,所以它从 0、15、30 开始,依此类推。
我只是在闲逛 BeautifulSoup 并在最近了解它后在不同的网站上对其进行测试。我目前正在尝试遍历多个页面而不仅仅是第一页。我可以附加或写入我从我想要的任何特定页面获取的信息,但当然我很乐意将其自动化。
这是我在尝试让它运行到第五页时的当前代码。目前它只遍历第一个网页并将我正在寻找的相同信息写入我的 excel 文件,五次。在我的嵌套 for 循环中,我有一些打印语句,只是为了在我查看文件之前查看它是否在控制台上工作。
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import unicodecsv as csv
f = open("on_sale_games.csv", "w", encoding='utf-8')
headers = "Game name, Original price, Final price, Percent off\n"
f.write(headers)
for i in range(5):
my_url = 'https://store.steampowered.com/specials#p={}&tab=TopSellers'.format(i+1)
uClient = uReq(my_url) # open up the url and download the page.
page_html = uClient.read() # reading the html page and storing the info into page_html.
uClient.close() # closing the page.
page_soup = soup(page_html, 'html.parser') # html parsing
containers = page_soup.findAll("a", {"class": "tab_item"})
for container in containers:
name_stuff = container.findAll("div", {"class": "tab_item_name"})
name = name_stuff[0].text
print("Game name:", name)
original_price = container.findAll("div", {"class": "discount_original_price"})
original = original_price[0].text
print("Original price:", original)
discounted_price = container.findAll("div", {"class": "discount_final_price"})
final = discounted_price[0].text
print("Discounted price:", final)
discount_pct = container.findAll("div", {"class": "discount_pct"})
pct = discount_pct[0].text
print("Percent off:", pct)
f.write(name.replace(':', '').replace("™", " ") + ',' + original + ',' + final + ',' + pct + '\n')
f.close()
检查浏览器发出的请求,我注意到在后台发出请求以获取数据并获得 json
结果,您可以从那里开始工作:
for i in range(5):
my_url = 'https://store.steampowered.com/contenthub/querypaginated/specials/NewReleases/render/?query=&start={}'.format(i*15)
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
data = json.loads(page_html)["results_html"]
page_soup = soup(data, 'html.parser')
# Rest of the code
这就像一个 API 每页有 15 个元素,所以它从 0、15、30 开始,依此类推。