蒸汽市场解析
Steam market parsing
我有一个link
并且最后有“_price_asc”,进行升序排序。当我在浏览器中按照此 link 排序时,效果很好。
但是!如果我尝试使用 bs4 解析项目 links,这会给我随机价格的项目,即升序排序不起作用
我做错了什么?
from urllib.request import urlopen
from bs4 import BeautifulSoup
link = 'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc'
total_links = ''
page = urlopen(link)
bs_page = BeautifulSoup(page.read(), features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
for g in range(10):
total_links += str(objects[g]["href"]) + '\n'
print(total_links)
之所以会出现这种情况,是因为如果您查看以下内容 link
https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc
link以“#p1_price_asc”结尾,hashtag是一个页面的指标各种标记,这里是一个 link 给出了一个完整的解释。基本上 url 中的“#”通常由 javascript 函数调用。
因为您正在使用以下方式下载页面:
page = urlopen(link)
这不会导致执行排序的 javascript 函数调用。我强烈推荐标签上的 link,因为它比我解释得更好。
现在关于如何实现你想要的,你有两个选择:
- 使用 selenium 库,因为它可以模拟浏览器
- 继续使用你正在使用的,并自己手动对数据进行排序(这是微不足道的,你会学到更多)
我个人会推荐方法 2,因为学习 selenium 可能会有点麻烦,而且通常不值得...在我看来。
此页面使用 JavaScript 获取排序数据,但 BeautifulSoup
/urllib
不能 运行 JavaScript
但是在 Firefox
/Chrome
中使用 DevTools
(选项卡:Network
,过滤器:XHR
)我发现 JavaScript
从一些 url 中读取 JSON 数据并且有 HTML 和排序数据 - 所以你可以使用这个 url 和 BeautifulSoup
来获取排序数据。
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json
# new url
link = 'https://steamcommunity.com/market/search/render/?query=&start=0&count=10&search_descriptions=0&sort_column=price&sort_dir=asc&appid=730&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife'
page = urlopen(link)
data = json.loads(page.read().decode())
html = data['results_html']
bs_page = BeautifulSoup(html, features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
data = []
for g in objects:
link = g["href"]
price = g.find('span', {'data-price': True}).text
data.append((price, link))
print("\n".join(f"{price} | {link}" for price, link in data))
结果:
.43 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Urban%20Masked%20%28Field-Tested%29
.70 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Field-Tested%29
.00 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Minimal%20Wear%29
.52 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Scorched%20%28Battle-Scarred%29
.48 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Safari%20Mesh%20%28Field-Tested%29
.32 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Forest%20DDPAT%20%28Battle-Scarred%29
.90 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Well-Worn%29
.52 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Forest%20DDPAT%20%28Field-Tested%29
.99 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Boreal%20Forest%20%28Field-Tested%29
.08 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Scorched%20%28Field-Tested%29
顺便说一句: 这是我的第一个版本,它是从旧 url 读取并在 Python 中排序。但它只能对第一页上的数据进行排序。为了获得更好的结果,它必须阅读所有页面——这会花费很多时间。
from urllib.request import urlopen
from bs4 import BeautifulSoup
link = 'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc'
page = urlopen(link)
bs_page = BeautifulSoup(page.read(), features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
data = []
for g in objects:
link = g["href"]
price = g.find('span', {'data-price': True})['data-price']
price = int(price)
data.append((price,link))
data = sorted(data)
print("\n".join(f"${price/100} USD | {link}" for price, link in data))
我有一个link
并且最后有“_price_asc”,进行升序排序。当我在浏览器中按照此 link 排序时,效果很好。
但是!如果我尝试使用 bs4 解析项目 links,这会给我随机价格的项目,即升序排序不起作用
我做错了什么?
from urllib.request import urlopen
from bs4 import BeautifulSoup
link = 'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc'
total_links = ''
page = urlopen(link)
bs_page = BeautifulSoup(page.read(), features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
for g in range(10):
total_links += str(objects[g]["href"]) + '\n'
print(total_links)
之所以会出现这种情况,是因为如果您查看以下内容 link
https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc
link以“#p1_price_asc”结尾,hashtag是一个页面的指标各种标记,这里是一个 link 给出了一个完整的解释。基本上 url 中的“#”通常由 javascript 函数调用。
因为您正在使用以下方式下载页面:
page = urlopen(link)
这不会导致执行排序的 javascript 函数调用。我强烈推荐标签上的 link,因为它比我解释得更好。
现在关于如何实现你想要的,你有两个选择:
- 使用 selenium 库,因为它可以模拟浏览器
- 继续使用你正在使用的,并自己手动对数据进行排序(这是微不足道的,你会学到更多)
我个人会推荐方法 2,因为学习 selenium 可能会有点麻烦,而且通常不值得...在我看来。
此页面使用 JavaScript 获取排序数据,但 BeautifulSoup
/urllib
不能 运行 JavaScript
但是在 Firefox
/Chrome
中使用 DevTools
(选项卡:Network
,过滤器:XHR
)我发现 JavaScript
从一些 url 中读取 JSON 数据并且有 HTML 和排序数据 - 所以你可以使用这个 url 和 BeautifulSoup
来获取排序数据。
from urllib.request import urlopen
from bs4 import BeautifulSoup
import json
# new url
link = 'https://steamcommunity.com/market/search/render/?query=&start=0&count=10&search_descriptions=0&sort_column=price&sort_dir=asc&appid=730&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife'
page = urlopen(link)
data = json.loads(page.read().decode())
html = data['results_html']
bs_page = BeautifulSoup(html, features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
data = []
for g in objects:
link = g["href"]
price = g.find('span', {'data-price': True}).text
data.append((price, link))
print("\n".join(f"{price} | {link}" for price, link in data))
结果:
.43 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Urban%20Masked%20%28Field-Tested%29
.70 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Field-Tested%29
.00 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Minimal%20Wear%29
.52 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Scorched%20%28Battle-Scarred%29
.48 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Safari%20Mesh%20%28Field-Tested%29
.32 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Forest%20DDPAT%20%28Battle-Scarred%29
.90 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Night%20Stripe%20%28Well-Worn%29
.52 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Forest%20DDPAT%20%28Field-Tested%29
.99 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Boreal%20Forest%20%28Field-Tested%29
.08 USD | https://steamcommunity.com/market/listings/730/%E2%98%85%20Navaja%20Knife%20%7C%20Scorched%20%28Field-Tested%29
顺便说一句: 这是我的第一个版本,它是从旧 url 读取并在 Python 中排序。但它只能对第一页上的数据进行排序。为了获得更好的结果,它必须阅读所有页面——这会花费很多时间。
from urllib.request import urlopen
from bs4 import BeautifulSoup
link = 'https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Type%5B%5D=tag_CSGO_Type_Knife&appid=730#p1_price_asc'
page = urlopen(link)
bs_page = BeautifulSoup(page.read(), features="html.parser")
objects = bs_page.findAll(class_="market_listing_row_link")
data = []
for g in objects:
link = g["href"]
price = g.find('span', {'data-price': True})['data-price']
price = int(price)
data.append((price,link))
data = sorted(data)
print("\n".join(f"${price/100} USD | {link}" for price, link in data))