抓取链接的类别链接,直到没有更多类别

scrape linked categories links until no more category

本网站https://mavin.io/category有多个类别。每个类别然后进一步有更多的类别等等。当一个类别到达最后时,它会显示此页面上的产品列表 https://mavin.io/search?q=&cat=33695

我想遍历所有类别并获取产品列表 link(不是产品 links),就像这个 https://mavin.io/search?q=&cat=33695

抓取那些 linked 类别的解决方案是什么?

import requests
from lxml.html import fromstring

url = 'https://mavin.io/category'
r = requests.get(url)

你可以创建一个遍历所有类别的递归函数,直到找到 none:

import requests
from bs4 import BeautifulSoup

url = "https://mavin.io/category"
s = requests.session()


def recur(url, path=None):
    if path is None:
        path = []

    r = s.get(url)
    soup = BeautifulSoup(r.content, "html.parser")
    cat_links = soup.select(".item-image a:has(h4)")
    for a in cat_links:
        yield from recur(
            "https://mavin.io" + a["href"], path + [a.h4.get_text(strip=True)]
        )

    if not cat_links:
        yield r.url, path


for link, path in recur(url):
    print(link, path)

打印:

https://mavin.io/search?q=&cat=33695 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Consoles & Parts']
https://mavin.io/search?q=&cat=63691 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Cup Holders']
https://mavin.io/search?q=&cat=40017 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Dash Parts']
https://mavin.io/search?q=&cat=33698 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Glove Boxes']
https://mavin.io/search?q=&cat=179848 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Interior Door Handles']
https://mavin.io/search?q=&cat=33696 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Interior Door Panels & Parts']
https://mavin.io/search?q=&cat=33700 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Pedals & Pads']
https://mavin.io/search?q=&cat=33701 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seats']
https://mavin.io/search?q=&cat=50458 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seat Belt Shoulder Pads']
https://mavin.io/search?q=&cat=33702 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seat Covers']
https://mavin.io/search?q=&cat=33703 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Shift Knobs & Boots']
https://mavin.io/search?q=&cat=33704 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Steering Wheels & Horns']
https://mavin.io/search?q=&cat=46102 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Sun Visors']
https://mavin.io/search?q=&cat=50459 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Switches & Controls']
https://mavin.io/search?q=&cat=33697 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Floor Mats & Carpets']
https://mavin.io/search?q=&cat=63690 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Cargo Nets, Trays & Liners']
https://mavin.io/search?q=&cat=33699 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Mirrors']
https://mavin.io/search?q=&cat=33705 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Trim']
https://mavin.io/search?q=&cat=40018 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Window Cranks & Parts']
https://mavin.io/search?q=&cat=33706 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Window Motors & Parts']
https://mavin.io/search?q=&cat=33651 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Exterior', 'Racks']
https://mavin.io/search?q=&cat=36475 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Exterior', 'Body Kits']

...and so on.