速卖通网页抓取工具不重复

Aliexpress web scraper not repeating

我是一个完全的初学者,但对我在速卖通上的网页抓取产品代码有疑问。

问题是我只得到 1 个结果,而不是所有结果。

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1').text
soup = BeautifulSoup(html_text, 'lxml')
producten = soup.find_all('div', class_ ='organic-list app-organic-search__list')
for product in producten:
    product_naam = product.find('p', class_ = 'elements-title-normal__content large').text
    jaren_actief = product.find('span', class_ = 'seller-tag__year flex-no-shrink').text
    print(f'''Product naam: {product_naam} ''')

如何获取所有产品的信息?

有关产品的信息(价格、名称等)嵌入在 Javascript 的 HTML 页面中(实际上 HTML 仅呈现 8 个产品)。您可以使用 re/json 模块来解析它。例如:

import re
import json
import requests

html_text = requests.get(
    "https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1"
).text

data = re.search(r"window\.__page__data__config = (\{.*\})", html_text).group(1)
data = json.loads(data)
# uncomment to print all data:
# print(json.dumps(data, indent=4))

for offer in data["props"]["offerResultData"]["offerList"]:
    print(
        "{:<20} {}".format(
            offer["tradePrice"]["price"], offer["information"]["puretitle"]
        )
    )

打印:

US [=11=].02-[=11=].10       Aangepaste Tag Sticker Labels Lange Bereik Goedkope Passieve Papier Roll Uhf Rfid Label
US [=11=].07-[=11=].12       Long Range Goedkope Passieve Papier Roll Uhf Rfid Chip Label Tag Sticker
US 8.00-8.90   Hopeland 15 meter uhf rfid scanner r2000 rfid reader device uhf scanner handheld terminal multi tag uhf rfid scanner
US [=11=].03-[=11=].06       Factory Outlet Uhf Rfid Sticker/Label Met Chip
US 8.00-8.90   Hopeland Draagbare Uhf Rfid Terminal ISO18000 6C Multi-Tag Management Uhf Rfid Handheld Reader 2D Barcode Uhf Rfid Terminal
US [=11=].03-[=11=].12       Gratis Monster Waterdichte Nfc 213 Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].06             Printable Uhf Rfid Adhesive Label/Rfid Sticker Tag/Rfid Tag Voor Boeken
US [=11=].04-[=11=].13       Aangepaste Tags Alien H3 9662 H9 9640/M4E Chip Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].04-[=11=].12       Gratis Sample Lange Range Passieve Uhf Rfid Tag/ Label/ Sticker
US [=11=].03-[=11=].06       Aangepaste Tags Long Range Uhf Rfid Inlay/Natte Inlay/Label/Sticker
US [=11=].08-[=11=].20       Full Color Afdrukken Hf/Uhf Passieve Papier Roll Smart Nfc Rfid Label/Sticker/Tag
US [=11=].08-[=11=].30       Rfid Uhf H3 9662 9654 Chip Inlay/Label/Sticker Tag (Asset Warehousing Tracking)
US [=11=].06-[=11=].08       50*50Mm Uhf Bibliotheek Boek Documenten Rfid Tag Sticker Label
US [=11=].09-[=11=].15       Alien H3 9662, Alien H3 9654, Alien H4 UHF RFID Inlay/Sticker/Label
US [=11=].06-[=11=].15       Gratis Sample Lange Bereik H3 Passieve Uhf Herbruikbare Rfid Sticker Tag Label Voor Asset Tracking
US [=11=].06-[=11=].13       Chenxin Apparel Management Custom Afdrukken Uhf Rfid Tag Rfid Kledingstuk Wassen Zorg Etiketten Voor Kleding
US [=11=].23-[=11=].25       Rfid Uhf Electronic Label Washing Cloth Washing Label Heat Resistant Rfid Label Flexible Clothing
US [=11=].09-[=11=].12       Hot Selling Passief Printable Inlay Sticker Tag Uhf Rfid Label Voor Magazijn Retail
US [=11=].03-[=11=].09       Global UHF RFID Label U7 RFID Tag Voor Bril Frames
US [=11=].06-[=11=].50       Factory price UHF RFID label/tag adhesive
US [=11=].06             LX-C90G Rfid Voorruit Tag Passieve Long Range Uhf Rfid Sticker Label Voor Auto Tol Tracking Voertuig Registratie Of Parking

将假定您要获取的信息是产品的 title 和@Andrej_Kesley 在他的回答中建议的 price

将页面解析为HTML,Beautiful Soup只能得到8个产品如下:

from bs4 import BeautifulSoup
import requests

url = 'https://dutch.alibaba.com/products/uhf_rfid_label.html?IndexArea=product_en&page=1'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
products = soup.find_all('a', class_='elements-title-normal')
prices = soup.find_all('span', class_='elements-offer-price-normal__price')
num_products = len(products)

for i in range(num_products):
    print("{:<20} {}".format(prices[i].text, products[i].text))

输出:

US$ 0,02-US$ 0,10    Aangepaste Tag Sticker Labels Lange Bereik Goedkope Passieve Papier Roll Uhf Rfid Label
US$ 0,08-US$ 0,12    Magazijn & Asset & Productie Lijnmanagement Lange Afstand Alien H3 Chip Uhf Rfid Papier Label
US$ 0,78-US$ 0,85    Hopeland hot selling UHF RFID Animal Ear Tag mini uhf rfid ear tag 860 960MHz 5m reading range rfid label tag uhf
US$ 0,03-US$ 0,06    Factory Outlet Uhf Rfid Sticker/Label Met Chip
US$ 0,07-US$ 0,12    Long Range Goedkope Passieve Papier Roll Uhf Rfid Chip Label Tag Sticker
US$ 0,02-US$ 0,04    Groothandel Asset Tracking R6 Chip Uhf Papier Label Rfid Uhf Inventaris Labels Uhf Sticker
US$ 0,04-US$ 0,13    Aangepaste Tags Alien H3 9662 H9 9640/M4E Chip Long Range Passieve Uhf Rfid Tag/ Label/ Sticker
US$ 0,08-US$ 0,20    Full Color Afdrukken Hf/Uhf Passieve Papier Roll Smart Nfc Rfid Label/Sticker/Tag