如何打印 BeautifulSoup 收集的数据?
How can I print data collected by BeautifulSoup?
之前没有 Python 经验,所以这可能是非常基础的。
我正在尝试记录加拿大零售商 SportChek 销售的所有曲棍球棒的名称和后来的价格。
到目前为止,我的代码如下所示:
# Import libraries
import requests
from bs4 import BeautifulSoup
# Collect the page
page = requests.get('https://www.sportchek.ca/categories/shop-by-sport/hockey/hockey-sticks.html?cid=search-hockey-sticks')
# Create BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
# Pull all text from product-title-text class
stick_name_list = soup.find_all(class_='product-title-text')
# Pull all text from product-price-text
stick_price_list = soup.find_all(class_='product-price-text')
我相信这段代码应该收集适当的数据,但我不确定现在如何显示变量。
使用变量名(即"stick_name_list")returns“[]”和"print stick_name_list"需要括号,但显然"print 'stick_name_list'"是不对的。
感谢任何指导。
看起来像那个网站,
https://www.sportchek.ca/categories/shop-by-sport/hockey/hockey-sticks.html?cid=search-hockey-sticks
使用 JavaScript 加载产品数据,因此当 requests.get
获取 html 时没有要解析的产品。
如果您在浏览器中禁用 JavaScript,您将看到 html 标签没有 class product-title-text
或 product-price-text
.
更多信息在这里:
Using python Requests with javascript pages
我建议您查看是否可以解析网页上可能存在的 JSON。此处有更多信息:
您可以使用与页面相同的 url 来更新内容。您可以在网络选项卡中找到它。它 returns json 您可以根据类型 == 产品进行过滤以获得曲棍球棒。您可以更改 url 查询字符串中的计数参数以返回更多结果。
import requests
import pandas as pd
data = requests.get('https://www.sportchek.ca/services/sportchek/search-and-promote/products?x1=c.category-level-1&q1=Gear&x2=c.category-level-2&q2=Hockey&x3=c.category-level-3&q3=Hockey+Sticks&preselectedCategoriesNumber=3&preselectedBrandsNumber=0&page=1&count=100').json()
titles, prices = zip(*[(item['title'], item['price']) for item in data['products'] if item['type'] == 'product'])
df = pd.DataFrame([(item['title'], item['price']) for item in data['products'] if item['type'] == 'product'], columns = ['title', 'price'])
print(df.head())
df.head()
正如其他人所说,您可以直接获取 json(而不是必须解析它)
import requests
import math
from pandas.io.json import json_normalize
url = 'https://www.sportchek.ca/services/sportchek/search-and-promote/products'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
payload = {
'x1': 'c.category-level-1',
'q1': 'Gear',
'x2': 'c.category-level-2',
'q2': 'Hockey',
'x3': 'c.category-level-3',
'q3': 'Hockey Sticks',
'preselectedCategoriesNumber': '3',
'preselectedBrandsNumber': '0',
'page': '1',
'count': '200'}
jsonData = requests.get(url, headers=headers, params=payload).json()
total_products = jsonData['resultCount']['total']
total_pages = math.ceil(total_products / 200)
for page in range(2, total_pages+1):
payload = {
'x1': 'c.category-level-1',
'q1': 'Gear',
'x2': 'c.category-level-2',
'q2': 'Hockey',
'x3': 'c.category-level-3',
'q3': 'Hockey Sticks',
'preselectedCategoriesNumber': '3',
'preselectedBrandsNumber': '0',
'page': page,
'count': '200'}
products = requests.get(url, headers=headers, params=payload).json()['products']
jsonData['products'] = jsonData['products'] + products
print ('Processed page: %s' %page)
df = json_normalize(jsonData['products'])
您可以按照自己喜欢的任何方式操作 table,或者直接使用 json 文件。不过,我只是将它转换为 table。
输出:
print (df[['title', 'price']])
title price
0 Bauer Supreme 1S Griptac Senior Hockey Stick -... 339.99
1 Warrior Covert QRL SE Grip Senior Hockey Stick 329.99
2 Bauer Vapor X600 Lite Griptac Senior Hockey Stick 69.99
3 Gift Cards NaN
4 Bauer Supreme 1S Clear Senior Hockey Stick - G... 339.99
5 Bauer Vapor 1X Lite Griptac Senior Hockey Stick 339.99
6 Bauer NEXUS 1N Griptac Gen II Senior Hockey Stick 254.97
7 Flash Sale NaN
8 Sher-Wood Project 9 Sticks NaN
9 Bauer Supreme 2S Team Griptac Senior Hockey Stick 159.99
10 Bauer Supreme S160 Griptac Junior Hockey Stick... 44.97
11 Bauer Nexus 2N Pro Senior Hockey Stick 319.99
12 Warrior Alpha QX Grip Intermediate Hockey Stick 184.88
13 TRUE XC5 ACF Grip Junior Hockey Stick 79.99
14 Warrior Covert QRE ST2 Grip Senior Hockey Stick 89.99
15 Mother's Day Gift Guide NaN
16 Bauer Supreme S190 Griptac Senior Hockey Stick... 156.97
17 Bauer Vapor X700 Lite Griptac Senior Hockey Stick 119.99
18 Bauer Supreme 2S Pro Griptac Senior Hockey Stick 319.99
19 Bauer Nexus 2N Pro Junior Hockey Stick 199.99
20 Bauer Vapor 1X Lite Griptac Intermediate Hocke... 319.99
21 Bauer Supreme 1S Griptac Intermediate Hockey S... 223.97
22 Bauer Nexus 2N Pro Intermediate Hockey Stick 299.99
23 TRUE XC9 ACF Grip Junior 30 Hockey Stick 119.99
24 TRUE XC9 ACF Youth 20 Hockey Stick 99.99
25 Bauer Nexus 2N Senior Hockey Stick 224.99
26 Bauer Supreme 1S Youth Hockey Stick - Gen II 69.97
27 TRUE XC9 ACF Grip Gen II Senior Hockey Stick 319.99
28 Bauer Supreme 2S Pro Griptac Junior Hockey Stick 199.99
29 Bauer NEXUS N7000 Griptac Gen II Intermediate ... 89.97
.. ... ...
408 Warrior Covert QRL Grip Senior Hockey Stick 159.97
409 Bauer Vapor X800 Griptac Gen II Senior Hockey ... 109.97
410 Graf G95 Revolt Grip Senior Hockey Stick - GP0... 109.88
411 CCM Ribcor 47K Grip Senior Hockey Stick 79.97
412 Sher-Wood BPM 060 Grip Senior Hockey Stick 51.97
413 CCM RBZ Revolution Grip Senior Hockey Stick 149.88
414 CCM Premier R1.5 Senior Goalie Stick - Crawfor... 89.97
415 Bauer Vapor 1X Senior Goalie Stick - P31 25" 289.99
416 Sher-Wood GS350 Senior Goalie Stick 24" - PP41 96.97
417 Sher-Wood GS350 Senior Goalie Stick - PP41 27" 96.97
418 Bauer Vapor X900 Senior Goalie Stick - P31 26" 199.99
419 Sher-Wood GS150 Senior Goalie Stick - 24" 74.97
420 Sher-Wood GS150 Senior Goalie Stick - 25" 74.97
421 CCM 1060 Senior Goalie Stick - Price 27" 89.88
422 Sher-Wood GS150 Senior Goalie Stick - 26" 74.97
423 Sher-Wood GS150 Senior Goalie Stick - 27" 74.97
424 CCM Premier R1.9 Senior Goalie Stick - Crawfor... 119.97
425 Sher-Wood BPM 090 Grip Intermediate Hockey Stick 81.97
426 Warrior Covert QRL5 Grip Intermediate Hockey S... 63.97
427 Warrior Covert DT1 LT Grip Intermediate Hockey... 111.88
428 Warrior Covert Super Dolomite Grip Intermediat... 189.88
429 Warrior Dynasty HD1 Intermediate Stick - Grip ... 123.88
430 Easton Stealth CX Grip Intermediate Hockey Sti... 159.88
431 Easton Synergy 20 Intermediate Stick - Grip - ... 34.88
432 Sherwood T120 Intermediate Grip Hockey Stick -... 99.97
433 GRAF G75 Intermediate 70 Flex Hockey Stick - GP22 99.88
434 Bauer Vapor X700 Griptac Gen II Intermediate H... 79.97
435 Easton Synergy HTX Intermediate Stick - Grip -... 115.88
436 Sherwood T120 Intermediate Grip Hockey Stick -... 99.97
437 Sher-Wood BPM 060 Grip Intermediate Hockey Stick 51.97
[438 rows x 2 columns]
之前没有 Python 经验,所以这可能是非常基础的。
我正在尝试记录加拿大零售商 SportChek 销售的所有曲棍球棒的名称和后来的价格。
到目前为止,我的代码如下所示:
# Import libraries
import requests
from bs4 import BeautifulSoup
# Collect the page
page = requests.get('https://www.sportchek.ca/categories/shop-by-sport/hockey/hockey-sticks.html?cid=search-hockey-sticks')
# Create BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
# Pull all text from product-title-text class
stick_name_list = soup.find_all(class_='product-title-text')
# Pull all text from product-price-text
stick_price_list = soup.find_all(class_='product-price-text')
我相信这段代码应该收集适当的数据,但我不确定现在如何显示变量。
使用变量名(即"stick_name_list")returns“[]”和"print stick_name_list"需要括号,但显然"print 'stick_name_list'"是不对的。
感谢任何指导。
看起来像那个网站,
https://www.sportchek.ca/categories/shop-by-sport/hockey/hockey-sticks.html?cid=search-hockey-sticks
使用 JavaScript 加载产品数据,因此当 requests.get
获取 html 时没有要解析的产品。
如果您在浏览器中禁用 JavaScript,您将看到 html 标签没有 class product-title-text
或 product-price-text
.
更多信息在这里:
Using python Requests with javascript pages
我建议您查看是否可以解析网页上可能存在的 JSON。此处有更多信息:
您可以使用与页面相同的 url 来更新内容。您可以在网络选项卡中找到它。它 returns json 您可以根据类型 == 产品进行过滤以获得曲棍球棒。您可以更改 url 查询字符串中的计数参数以返回更多结果。
import requests
import pandas as pd
data = requests.get('https://www.sportchek.ca/services/sportchek/search-and-promote/products?x1=c.category-level-1&q1=Gear&x2=c.category-level-2&q2=Hockey&x3=c.category-level-3&q3=Hockey+Sticks&preselectedCategoriesNumber=3&preselectedBrandsNumber=0&page=1&count=100').json()
titles, prices = zip(*[(item['title'], item['price']) for item in data['products'] if item['type'] == 'product'])
df = pd.DataFrame([(item['title'], item['price']) for item in data['products'] if item['type'] == 'product'], columns = ['title', 'price'])
print(df.head())
df.head()
正如其他人所说,您可以直接获取 json(而不是必须解析它)
import requests
import math
from pandas.io.json import json_normalize
url = 'https://www.sportchek.ca/services/sportchek/search-and-promote/products'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
payload = {
'x1': 'c.category-level-1',
'q1': 'Gear',
'x2': 'c.category-level-2',
'q2': 'Hockey',
'x3': 'c.category-level-3',
'q3': 'Hockey Sticks',
'preselectedCategoriesNumber': '3',
'preselectedBrandsNumber': '0',
'page': '1',
'count': '200'}
jsonData = requests.get(url, headers=headers, params=payload).json()
total_products = jsonData['resultCount']['total']
total_pages = math.ceil(total_products / 200)
for page in range(2, total_pages+1):
payload = {
'x1': 'c.category-level-1',
'q1': 'Gear',
'x2': 'c.category-level-2',
'q2': 'Hockey',
'x3': 'c.category-level-3',
'q3': 'Hockey Sticks',
'preselectedCategoriesNumber': '3',
'preselectedBrandsNumber': '0',
'page': page,
'count': '200'}
products = requests.get(url, headers=headers, params=payload).json()['products']
jsonData['products'] = jsonData['products'] + products
print ('Processed page: %s' %page)
df = json_normalize(jsonData['products'])
您可以按照自己喜欢的任何方式操作 table,或者直接使用 json 文件。不过,我只是将它转换为 table。
输出:
print (df[['title', 'price']])
title price
0 Bauer Supreme 1S Griptac Senior Hockey Stick -... 339.99
1 Warrior Covert QRL SE Grip Senior Hockey Stick 329.99
2 Bauer Vapor X600 Lite Griptac Senior Hockey Stick 69.99
3 Gift Cards NaN
4 Bauer Supreme 1S Clear Senior Hockey Stick - G... 339.99
5 Bauer Vapor 1X Lite Griptac Senior Hockey Stick 339.99
6 Bauer NEXUS 1N Griptac Gen II Senior Hockey Stick 254.97
7 Flash Sale NaN
8 Sher-Wood Project 9 Sticks NaN
9 Bauer Supreme 2S Team Griptac Senior Hockey Stick 159.99
10 Bauer Supreme S160 Griptac Junior Hockey Stick... 44.97
11 Bauer Nexus 2N Pro Senior Hockey Stick 319.99
12 Warrior Alpha QX Grip Intermediate Hockey Stick 184.88
13 TRUE XC5 ACF Grip Junior Hockey Stick 79.99
14 Warrior Covert QRE ST2 Grip Senior Hockey Stick 89.99
15 Mother's Day Gift Guide NaN
16 Bauer Supreme S190 Griptac Senior Hockey Stick... 156.97
17 Bauer Vapor X700 Lite Griptac Senior Hockey Stick 119.99
18 Bauer Supreme 2S Pro Griptac Senior Hockey Stick 319.99
19 Bauer Nexus 2N Pro Junior Hockey Stick 199.99
20 Bauer Vapor 1X Lite Griptac Intermediate Hocke... 319.99
21 Bauer Supreme 1S Griptac Intermediate Hockey S... 223.97
22 Bauer Nexus 2N Pro Intermediate Hockey Stick 299.99
23 TRUE XC9 ACF Grip Junior 30 Hockey Stick 119.99
24 TRUE XC9 ACF Youth 20 Hockey Stick 99.99
25 Bauer Nexus 2N Senior Hockey Stick 224.99
26 Bauer Supreme 1S Youth Hockey Stick - Gen II 69.97
27 TRUE XC9 ACF Grip Gen II Senior Hockey Stick 319.99
28 Bauer Supreme 2S Pro Griptac Junior Hockey Stick 199.99
29 Bauer NEXUS N7000 Griptac Gen II Intermediate ... 89.97
.. ... ...
408 Warrior Covert QRL Grip Senior Hockey Stick 159.97
409 Bauer Vapor X800 Griptac Gen II Senior Hockey ... 109.97
410 Graf G95 Revolt Grip Senior Hockey Stick - GP0... 109.88
411 CCM Ribcor 47K Grip Senior Hockey Stick 79.97
412 Sher-Wood BPM 060 Grip Senior Hockey Stick 51.97
413 CCM RBZ Revolution Grip Senior Hockey Stick 149.88
414 CCM Premier R1.5 Senior Goalie Stick - Crawfor... 89.97
415 Bauer Vapor 1X Senior Goalie Stick - P31 25" 289.99
416 Sher-Wood GS350 Senior Goalie Stick 24" - PP41 96.97
417 Sher-Wood GS350 Senior Goalie Stick - PP41 27" 96.97
418 Bauer Vapor X900 Senior Goalie Stick - P31 26" 199.99
419 Sher-Wood GS150 Senior Goalie Stick - 24" 74.97
420 Sher-Wood GS150 Senior Goalie Stick - 25" 74.97
421 CCM 1060 Senior Goalie Stick - Price 27" 89.88
422 Sher-Wood GS150 Senior Goalie Stick - 26" 74.97
423 Sher-Wood GS150 Senior Goalie Stick - 27" 74.97
424 CCM Premier R1.9 Senior Goalie Stick - Crawfor... 119.97
425 Sher-Wood BPM 090 Grip Intermediate Hockey Stick 81.97
426 Warrior Covert QRL5 Grip Intermediate Hockey S... 63.97
427 Warrior Covert DT1 LT Grip Intermediate Hockey... 111.88
428 Warrior Covert Super Dolomite Grip Intermediat... 189.88
429 Warrior Dynasty HD1 Intermediate Stick - Grip ... 123.88
430 Easton Stealth CX Grip Intermediate Hockey Sti... 159.88
431 Easton Synergy 20 Intermediate Stick - Grip - ... 34.88
432 Sherwood T120 Intermediate Grip Hockey Stick -... 99.97
433 GRAF G75 Intermediate 70 Flex Hockey Stick - GP22 99.88
434 Bauer Vapor X700 Griptac Gen II Intermediate H... 79.97
435 Easton Synergy HTX Intermediate Stick - Grip -... 115.88
436 Sherwood T120 Intermediate Grip Hockey Stick -... 99.97
437 Sher-Wood BPM 060 Grip Intermediate Hockey Stick 51.97
[438 rows x 2 columns]