我如何用美丽的汤解析这个URL？ URL 是什么格式？

Question

我想从这种类型的 url:

中抓取一些数据

http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?start=0&count=5&currency=&language=english

我不知道，它包含某种 html 标签，但我不知道如何实际抓取此页面（我将 beautifulSoup 用于我的其他 urls).

希望你能帮帮我。

Answer 1

您加载的页面是一个 JSON 文件。像这样使用 JSON library :

import requests
import json

html = requests.get('http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?start=0&count=5&currency=&language=english')

# Load the parsed page into a JSON object.
steam_json = json.loads(html.text)

# Extract whatever you want like this:
success_status = steam_json['success']

Answer 2

您可能想用 python 来完成它，即 jsoup 是 Java 的一个类似于 BeautifulSoup 的库。 urlreturns一个json. Your first have to load it as a python-native instance. In this case the corresponding python-native object is a dictionary，使用json库：

import json, urllib2 
request  = urllib2.Request(url=your_url)
request.add_header('User-agent',user_agent) # let's say you want to add headers like user-agent etc...
response = urllib2.urlopen(request)
dico = json.loads(response.read())

然后您必须探索您感兴趣的键值对，并像通常使用 beautifulSoup 一样解析包含 html 的值。

另外请注意，您要从中获取数据的站点可以是超媒体驱动的（请参阅 HATEOAS), which is a kind of AJAX 实现时没有图形界面。无论是什么，它都可以让您更加精确（因此对服务器更友好）在您请求的数据中。

url_base = "http://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29/render?" 
start = 0
count = 5
currency = ''
language = 'english'
your_url = url_base + "start={0}&count={1}&currency={2}&language={3}".format(start,count,currency,language)

我如何用美丽的汤解析这个URL？ URL 是什么格式？

How do i parse this URL with beautiful Soup? What format is the URL?

html

python

beautifulsoup

web-crawler

python-3.x