如何用漂亮的汤刮这个网站的项目?
How to scrape this website items with beautiful soup?
您好,我正在尝试抓取该网站上的所有产品。
https://segari.id/
但是 url 是静态的,而且当我尝试抓取时 soup 在这个网站上也不起作用。
即使有,我如何无限滚动到底部以获取项目?刮取所有项目的推荐方法是什么?
这是我当前的代码:
#user agent
from fake_useragent import UserAgent
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
def userAgent(URL):
dateNow = datetime.date(datetime.now())
ua = UserAgent()
USER_AGENT = ua.random
headers = {"User-Agent" : str(USER_AGENT),"Accept-Encoding": "*","Connection": "keep-alive"}
print(USER_AGENT)
resp = requests.get(URL, headers=headers)
soup = BeautifulSoup(resp.content, "html.parser")
if resp.status_code != 200:
print(f'error 200:{URL}')
urlError = pd.DataFrame({'url':[URL],
'date':[dateNow]
})
responseCode=500
urlError.to_csv('errorUrl/errorUrl.csv', mode='a', index=False, header=False)
elif resp.status_code == 200:
responseCode=200
return soup,responseCode
#scraping component
URL = https://segari.id/
soup = userAgent(URL)
title = soup.find_all('div', {"class":"ProductCard_productName__1fPfF"})
listTitle=[]
for t in title:
listTitle.append(t)
list = pd.DataFrame(listTitle)
您只能使用 requests
模块获取数据,因为数据是由 Javascript 从 API 调用 json 响应以及 GET 方法动态加载的。
import requests
data = requests.get('https://api-v2.segari.id/v1.1/products/price?agentId=311&size=40&page=0&paginationType=slice&deliveryDate=2022-05-12&deliveryServiceType=NEXT_DAY_DELIVERY&availableDeliveryDates=2022-05-12,2022-05-13,2022-05-14').json()
for item in data['data']['data']:
title=item['productDTO']['name']
print(title)
输出:
Ayam Broiler Probiotic Utuh
Ayam Broiler Utuh Premium
Ayam Kampung Utuh
Dada Ayam Boneless Frozen
Fillet Ayam
Paha Ayam Boneless Frozen
Kentang Dieng AB
Kentang Rendang
Kurma Ajwa Curah
Kurma Tunisia
Kurma Tunisia 500 gram
Lemon Import Imperfect
Pear Xiang Lie Imperfect
Ayam Giling
Bone Marrow Sum Sum Sapi Lokal
Daging Kerang Hijau
Daging Sapi Import Giling
Daging Sapi Import Gulai Value Pack
Ikan Gabus Laut Fillet
Tulang Kaldu Sapi Lokal
Bumbu Ayam Garam Qian Ji
Beras Putih 1 kg Cap Bunga Setra Ramos
Ikan Dori Fillet
ABC Squash Delight Syrup Orange 460 ml Bundle 3
Ampela Ayam
Anggur Crimson
Anggur HIjau Autumn Crisp
Anggur Hijau Calmeria
Anggur Merah Red Globe Premium
Apel Envy
Apel Fuji
Apel Fuji Rosy Blush
Apel Fuji Wang Shan
Ati Ayam
Ayam Broiler Potong 4
Ayyomi Telur Ayam Kampoeng
Ayyomi Telur Ayam Negeri
Ayyomi Telur Omega 3
Baby Buncis Kenya
Bawang Bombay
您好,我正在尝试抓取该网站上的所有产品。 https://segari.id/
但是 url 是静态的,而且当我尝试抓取时 soup 在这个网站上也不起作用。 即使有,我如何无限滚动到底部以获取项目?刮取所有项目的推荐方法是什么?
这是我当前的代码:
#user agent
from fake_useragent import UserAgent
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
def userAgent(URL):
dateNow = datetime.date(datetime.now())
ua = UserAgent()
USER_AGENT = ua.random
headers = {"User-Agent" : str(USER_AGENT),"Accept-Encoding": "*","Connection": "keep-alive"}
print(USER_AGENT)
resp = requests.get(URL, headers=headers)
soup = BeautifulSoup(resp.content, "html.parser")
if resp.status_code != 200:
print(f'error 200:{URL}')
urlError = pd.DataFrame({'url':[URL],
'date':[dateNow]
})
responseCode=500
urlError.to_csv('errorUrl/errorUrl.csv', mode='a', index=False, header=False)
elif resp.status_code == 200:
responseCode=200
return soup,responseCode
#scraping component
URL = https://segari.id/
soup = userAgent(URL)
title = soup.find_all('div', {"class":"ProductCard_productName__1fPfF"})
listTitle=[]
for t in title:
listTitle.append(t)
list = pd.DataFrame(listTitle)
您只能使用 requests
模块获取数据,因为数据是由 Javascript 从 API 调用 json 响应以及 GET 方法动态加载的。
import requests
data = requests.get('https://api-v2.segari.id/v1.1/products/price?agentId=311&size=40&page=0&paginationType=slice&deliveryDate=2022-05-12&deliveryServiceType=NEXT_DAY_DELIVERY&availableDeliveryDates=2022-05-12,2022-05-13,2022-05-14').json()
for item in data['data']['data']:
title=item['productDTO']['name']
print(title)
输出:
Ayam Broiler Probiotic Utuh
Ayam Broiler Utuh Premium
Ayam Kampung Utuh
Dada Ayam Boneless Frozen
Fillet Ayam
Paha Ayam Boneless Frozen
Kentang Dieng AB
Kentang Rendang
Kurma Ajwa Curah
Kurma Tunisia
Kurma Tunisia 500 gram
Lemon Import Imperfect
Pear Xiang Lie Imperfect
Ayam Giling
Bone Marrow Sum Sum Sapi Lokal
Daging Kerang Hijau
Daging Sapi Import Giling
Daging Sapi Import Gulai Value Pack
Ikan Gabus Laut Fillet
Tulang Kaldu Sapi Lokal
Bumbu Ayam Garam Qian Ji
Beras Putih 1 kg Cap Bunga Setra Ramos
Ikan Dori Fillet
ABC Squash Delight Syrup Orange 460 ml Bundle 3
Ampela Ayam
Anggur Crimson
Anggur HIjau Autumn Crisp
Anggur Hijau Calmeria
Anggur Merah Red Globe Premium
Apel Envy
Apel Fuji
Apel Fuji Rosy Blush
Apel Fuji Wang Shan
Ati Ayam
Ayam Broiler Potong 4
Ayyomi Telur Ayam Kampoeng
Ayyomi Telur Ayam Negeri
Ayyomi Telur Omega 3
Baby Buncis Kenya
Bawang Bombay