抓取链接的类别链接,直到没有更多类别
scrape linked categories links until no more category
本网站https://mavin.io/category
有多个类别。每个类别然后进一步有更多的类别等等。当一个类别到达最后时,它会显示此页面上的产品列表 https://mavin.io/search?q=&cat=33695
我想遍历所有类别并获取产品列表 link(不是产品 links),就像这个 https://mavin.io/search?q=&cat=33695
抓取那些 linked 类别的解决方案是什么?
import requests
from lxml.html import fromstring
url = 'https://mavin.io/category'
r = requests.get(url)
你可以创建一个遍历所有类别的递归函数,直到找到 none:
import requests
from bs4 import BeautifulSoup
url = "https://mavin.io/category"
s = requests.session()
def recur(url, path=None):
if path is None:
path = []
r = s.get(url)
soup = BeautifulSoup(r.content, "html.parser")
cat_links = soup.select(".item-image a:has(h4)")
for a in cat_links:
yield from recur(
"https://mavin.io" + a["href"], path + [a.h4.get_text(strip=True)]
)
if not cat_links:
yield r.url, path
for link, path in recur(url):
print(link, path)
打印:
https://mavin.io/search?q=&cat=33695 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Consoles & Parts']
https://mavin.io/search?q=&cat=63691 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Cup Holders']
https://mavin.io/search?q=&cat=40017 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Dash Parts']
https://mavin.io/search?q=&cat=33698 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Glove Boxes']
https://mavin.io/search?q=&cat=179848 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Interior Door Handles']
https://mavin.io/search?q=&cat=33696 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Interior Door Panels & Parts']
https://mavin.io/search?q=&cat=33700 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Pedals & Pads']
https://mavin.io/search?q=&cat=33701 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seats']
https://mavin.io/search?q=&cat=50458 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seat Belt Shoulder Pads']
https://mavin.io/search?q=&cat=33702 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seat Covers']
https://mavin.io/search?q=&cat=33703 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Shift Knobs & Boots']
https://mavin.io/search?q=&cat=33704 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Steering Wheels & Horns']
https://mavin.io/search?q=&cat=46102 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Sun Visors']
https://mavin.io/search?q=&cat=50459 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Switches & Controls']
https://mavin.io/search?q=&cat=33697 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Floor Mats & Carpets']
https://mavin.io/search?q=&cat=63690 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Cargo Nets, Trays & Liners']
https://mavin.io/search?q=&cat=33699 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Mirrors']
https://mavin.io/search?q=&cat=33705 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Trim']
https://mavin.io/search?q=&cat=40018 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Window Cranks & Parts']
https://mavin.io/search?q=&cat=33706 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Window Motors & Parts']
https://mavin.io/search?q=&cat=33651 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Exterior', 'Racks']
https://mavin.io/search?q=&cat=36475 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Exterior', 'Body Kits']
...and so on.
本网站https://mavin.io/category
有多个类别。每个类别然后进一步有更多的类别等等。当一个类别到达最后时,它会显示此页面上的产品列表 https://mavin.io/search?q=&cat=33695
我想遍历所有类别并获取产品列表 link(不是产品 links),就像这个 https://mavin.io/search?q=&cat=33695
抓取那些 linked 类别的解决方案是什么?
import requests
from lxml.html import fromstring
url = 'https://mavin.io/category'
r = requests.get(url)
你可以创建一个遍历所有类别的递归函数,直到找到 none:
import requests
from bs4 import BeautifulSoup
url = "https://mavin.io/category"
s = requests.session()
def recur(url, path=None):
if path is None:
path = []
r = s.get(url)
soup = BeautifulSoup(r.content, "html.parser")
cat_links = soup.select(".item-image a:has(h4)")
for a in cat_links:
yield from recur(
"https://mavin.io" + a["href"], path + [a.h4.get_text(strip=True)]
)
if not cat_links:
yield r.url, path
for link, path in recur(url):
print(link, path)
打印:
https://mavin.io/search?q=&cat=33695 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Consoles & Parts']
https://mavin.io/search?q=&cat=63691 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Cup Holders']
https://mavin.io/search?q=&cat=40017 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Dash Parts']
https://mavin.io/search?q=&cat=33698 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Glove Boxes']
https://mavin.io/search?q=&cat=179848 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Interior Door Handles']
https://mavin.io/search?q=&cat=33696 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Interior Door Panels & Parts']
https://mavin.io/search?q=&cat=33700 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Pedals & Pads']
https://mavin.io/search?q=&cat=33701 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seats']
https://mavin.io/search?q=&cat=50458 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seat Belt Shoulder Pads']
https://mavin.io/search?q=&cat=33702 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Seat Covers']
https://mavin.io/search?q=&cat=33703 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Shift Knobs & Boots']
https://mavin.io/search?q=&cat=33704 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Steering Wheels & Horns']
https://mavin.io/search?q=&cat=46102 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Sun Visors']
https://mavin.io/search?q=&cat=50459 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Switches & Controls']
https://mavin.io/search?q=&cat=33697 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Floor Mats & Carpets']
https://mavin.io/search?q=&cat=63690 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Cargo Nets, Trays & Liners']
https://mavin.io/search?q=&cat=33699 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Mirrors']
https://mavin.io/search?q=&cat=33705 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Trim']
https://mavin.io/search?q=&cat=40018 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Window Cranks & Parts']
https://mavin.io/search?q=&cat=33706 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Interior', 'Window Motors & Parts']
https://mavin.io/search?q=&cat=33651 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Exterior', 'Racks']
https://mavin.io/search?q=&cat=36475 ['eBay Motors', 'Parts & Accessories', 'Car & Truck Parts', 'Exterior', 'Body Kits']
...and so on.