速卖通网页抓取工具不重复
Aliexpress web scraper not repeating
我是一个完全的初学者,但对我在速卖通上的网页抓取产品代码有疑问。
问题是我只得到 1 个结果,而不是所有结果。
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1').text
soup = BeautifulSoup(html_text, 'lxml')
producten = soup.find_all('div', class_ ='organic-list app-organic-search__list')
for product in producten:
product_naam = product.find('p', class_ = 'elements-title-normal__content large').text
jaren_actief = product.find('span', class_ = 'seller-tag__year flex-no-shrink').text
print(f'''Product naam: {product_naam} ''')
如何获取所有产品的信息?
有关产品的信息(价格、名称等)嵌入在 Javascript 的 HTML 页面中(实际上 HTML 仅呈现 8 个产品)。您可以使用 re
/json
模块来解析它。例如:
import re
import json
import requests
html_text = requests.get(
"https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1"
).text
data = re.search(r"window\.__page__data__config = (\{.*\})", html_text).group(1)
data = json.loads(data)
# uncomment to print all data:
# print(json.dumps(data, indent=4))
for offer in data["props"]["offerResultData"]["offerList"]:
print(
"{:<20} {}".format(
offer["tradePrice"]["price"], offer["information"]["puretitle"]
)
)
打印:
US [=11=].02-[=11=].10 Aangepaste Tag Sticker Labels Lange Bereik Goedkope Passieve Papier Roll Uhf Rfid Label
US [=11=].07-[=11=].12 Long Range Goedkope Passieve Papier Roll Uhf Rfid Chip Label Tag Sticker
US 8.00-8.90 Hopeland 15 meter uhf rfid scanner r2000 rfid reader device uhf scanner handheld terminal multi tag uhf rfid scanner
US [=11=].03-[=11=].06 Factory Outlet Uhf Rfid Sticker/Label Met Chip
US 8.00-8.90 Hopeland Draagbare Uhf Rfid Terminal ISO18000 6C Multi-Tag Management Uhf Rfid Handheld Reader 2D Barcode Uhf Rfid Terminal
US [=11=].03-[=11=].12 Gratis Monster Waterdichte Nfc 213 Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].06 Printable Uhf Rfid Adhesive Label/Rfid Sticker Tag/Rfid Tag Voor Boeken
US [=11=].04-[=11=].13 Aangepaste Tags Alien H3 9662 H9 9640/M4E Chip Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].04-[=11=].12 Gratis Sample Lange Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].03-[=11=].06 Aangepaste Tags Long Range Uhf Rfid Inlay/Natte Inlay/Label/Sticker
US [=11=].08-[=11=].20 Full Color Afdrukken Hf/Uhf Passieve Papier Roll Smart Nfc Rfid Label/Sticker/Tag
US [=11=].08-[=11=].30 Rfid Uhf H3 9662 9654 Chip Inlay/Label/Sticker Tag (Asset Warehousing Tracking)
US [=11=].06-[=11=].08 50*50Mm Uhf Bibliotheek Boek Documenten Rfid Tag Sticker Label
US [=11=].09-[=11=].15 Alien H3 9662, Alien H3 9654, Alien H4 UHF RFID Inlay/Sticker/Label
US [=11=].06-[=11=].15 Gratis Sample Lange Bereik H3 Passieve Uhf Herbruikbare Rfid Sticker Tag Label Voor Asset Tracking
US [=11=].06-[=11=].13 Chenxin Apparel Management Custom Afdrukken Uhf Rfid Tag Rfid Kledingstuk Wassen Zorg Etiketten Voor Kleding
US [=11=].23-[=11=].25 Rfid Uhf Electronic Label Washing Cloth Washing Label Heat Resistant Rfid Label Flexible Clothing
US [=11=].09-[=11=].12 Hot Selling Passief Printable Inlay Sticker Tag Uhf Rfid Label Voor Magazijn Retail
US [=11=].03-[=11=].09 Global UHF RFID Label U7 RFID Tag Voor Bril Frames
US [=11=].06-[=11=].50 Factory price UHF RFID label/tag adhesive
US [=11=].06 LX-C90G Rfid Voorruit Tag Passieve Long Range Uhf Rfid Sticker Label Voor Auto Tol Tracking Voertuig Registratie Of Parking
将假定您要获取的信息是产品的 title
和@Andrej_Kesley 在他的回答中建议的 price
。
将页面解析为HTML,Beautiful Soup只能得到8个产品如下:
from bs4 import BeautifulSoup
import requests
url = 'https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
products = soup.find_all('a', class_='elements-title-normal')
prices = soup.find_all('span', class_='elements-offer-price-normal__price')
num_products = len(products)
for i in range(num_products):
print("{:<20} {}".format(prices[i].text, products[i].text))
输出:
US$ 0,02-US$ 0,10 Aangepaste Tag Sticker Labels Lange Bereik Goedkope Passieve Papier Roll Uhf Rfid Label
US$ 0,08-US$ 0,12 Magazijn & Asset & Productie Lijnmanagement Lange Afstand Alien H3 Chip Uhf Rfid Papier Label
US$ 0,78-US$ 0,85 Hopeland hot selling UHF RFID Animal Ear Tag mini uhf rfid ear tag 860 960MHz 5m reading range rfid label tag uhf
US$ 0,03-US$ 0,06 Factory Outlet Uhf Rfid Sticker/Label Met Chip
US$ 0,07-US$ 0,12 Long Range Goedkope Passieve Papier Roll Uhf Rfid Chip Label Tag Sticker
US$ 0,02-US$ 0,04 Groothandel Asset Tracking R6 Chip Uhf Papier Label Rfid Uhf Inventaris Labels Uhf Sticker
US$ 0,04-US$ 0,13 Aangepaste Tags Alien H3 9662 H9 9640/M4E Chip Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US$ 0,08-US$ 0,20 Full Color Afdrukken Hf/Uhf Passieve Papier Roll Smart Nfc Rfid Label/Sticker/Tag
我是一个完全的初学者,但对我在速卖通上的网页抓取产品代码有疑问。
问题是我只得到 1 个结果,而不是所有结果。
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1').text
soup = BeautifulSoup(html_text, 'lxml')
producten = soup.find_all('div', class_ ='organic-list app-organic-search__list')
for product in producten:
product_naam = product.find('p', class_ = 'elements-title-normal__content large').text
jaren_actief = product.find('span', class_ = 'seller-tag__year flex-no-shrink').text
print(f'''Product naam: {product_naam} ''')
如何获取所有产品的信息?
有关产品的信息(价格、名称等)嵌入在 Javascript 的 HTML 页面中(实际上 HTML 仅呈现 8 个产品)。您可以使用 re
/json
模块来解析它。例如:
import re
import json
import requests
html_text = requests.get(
"https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1"
).text
data = re.search(r"window\.__page__data__config = (\{.*\})", html_text).group(1)
data = json.loads(data)
# uncomment to print all data:
# print(json.dumps(data, indent=4))
for offer in data["props"]["offerResultData"]["offerList"]:
print(
"{:<20} {}".format(
offer["tradePrice"]["price"], offer["information"]["puretitle"]
)
)
打印:
US [=11=].02-[=11=].10 Aangepaste Tag Sticker Labels Lange Bereik Goedkope Passieve Papier Roll Uhf Rfid Label
US [=11=].07-[=11=].12 Long Range Goedkope Passieve Papier Roll Uhf Rfid Chip Label Tag Sticker
US 8.00-8.90 Hopeland 15 meter uhf rfid scanner r2000 rfid reader device uhf scanner handheld terminal multi tag uhf rfid scanner
US [=11=].03-[=11=].06 Factory Outlet Uhf Rfid Sticker/Label Met Chip
US 8.00-8.90 Hopeland Draagbare Uhf Rfid Terminal ISO18000 6C Multi-Tag Management Uhf Rfid Handheld Reader 2D Barcode Uhf Rfid Terminal
US [=11=].03-[=11=].12 Gratis Monster Waterdichte Nfc 213 Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].06 Printable Uhf Rfid Adhesive Label/Rfid Sticker Tag/Rfid Tag Voor Boeken
US [=11=].04-[=11=].13 Aangepaste Tags Alien H3 9662 H9 9640/M4E Chip Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].04-[=11=].12 Gratis Sample Lange Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].03-[=11=].06 Aangepaste Tags Long Range Uhf Rfid Inlay/Natte Inlay/Label/Sticker
US [=11=].08-[=11=].20 Full Color Afdrukken Hf/Uhf Passieve Papier Roll Smart Nfc Rfid Label/Sticker/Tag
US [=11=].08-[=11=].30 Rfid Uhf H3 9662 9654 Chip Inlay/Label/Sticker Tag (Asset Warehousing Tracking)
US [=11=].06-[=11=].08 50*50Mm Uhf Bibliotheek Boek Documenten Rfid Tag Sticker Label
US [=11=].09-[=11=].15 Alien H3 9662, Alien H3 9654, Alien H4 UHF RFID Inlay/Sticker/Label
US [=11=].06-[=11=].15 Gratis Sample Lange Bereik H3 Passieve Uhf Herbruikbare Rfid Sticker Tag Label Voor Asset Tracking
US [=11=].06-[=11=].13 Chenxin Apparel Management Custom Afdrukken Uhf Rfid Tag Rfid Kledingstuk Wassen Zorg Etiketten Voor Kleding
US [=11=].23-[=11=].25 Rfid Uhf Electronic Label Washing Cloth Washing Label Heat Resistant Rfid Label Flexible Clothing
US [=11=].09-[=11=].12 Hot Selling Passief Printable Inlay Sticker Tag Uhf Rfid Label Voor Magazijn Retail
US [=11=].03-[=11=].09 Global UHF RFID Label U7 RFID Tag Voor Bril Frames
US [=11=].06-[=11=].50 Factory price UHF RFID label/tag adhesive
US [=11=].06 LX-C90G Rfid Voorruit Tag Passieve Long Range Uhf Rfid Sticker Label Voor Auto Tol Tracking Voertuig Registratie Of Parking
将假定您要获取的信息是产品的 title
和@Andrej_Kesley 在他的回答中建议的 price
。
将页面解析为HTML,Beautiful Soup只能得到8个产品如下:
from bs4 import BeautifulSoup
import requests
url = 'https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
products = soup.find_all('a', class_='elements-title-normal')
prices = soup.find_all('span', class_='elements-offer-price-normal__price')
num_products = len(products)
for i in range(num_products):
print("{:<20} {}".format(prices[i].text, products[i].text))
输出:
US$ 0,02-US$ 0,10 Aangepaste Tag Sticker Labels Lange Bereik Goedkope Passieve Papier Roll Uhf Rfid Label
US$ 0,08-US$ 0,12 Magazijn & Asset & Productie Lijnmanagement Lange Afstand Alien H3 Chip Uhf Rfid Papier Label
US$ 0,78-US$ 0,85 Hopeland hot selling UHF RFID Animal Ear Tag mini uhf rfid ear tag 860 960MHz 5m reading range rfid label tag uhf
US$ 0,03-US$ 0,06 Factory Outlet Uhf Rfid Sticker/Label Met Chip
US$ 0,07-US$ 0,12 Long Range Goedkope Passieve Papier Roll Uhf Rfid Chip Label Tag Sticker
US$ 0,02-US$ 0,04 Groothandel Asset Tracking R6 Chip Uhf Papier Label Rfid Uhf Inventaris Labels Uhf Sticker
US$ 0,04-US$ 0,13 Aangepaste Tags Alien H3 9662 H9 9640/M4E Chip Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US$ 0,08-US$ 0,20 Full Color Afdrukken Hf/Uhf Passieve Papier Roll Smart Nfc Rfid Label/Sticker/Tag