如何在不单击按钮的情况下加载数据?
How to load data without clicking button?
我想从 https://e27.co/startups/ 中抓取所有初创公司的名称。
你可以看到默认有 20 个 startups 名称,要加载更多你需要点击 "Load more" 按钮。此按钮加载 10 个启动名称
我创建了 python 脚本,单击 "Load More" 按钮直到加载所有 (29000) 个启动项。它需要大量的时间和内存。
如何在不单击的情况下加载此数据?
我听到 AJAX 请求调用了一些东西,但我不明白如何实现它。
Html 按钮代码:
<button class="button btn-load-more" data-start="0">Load More</button>
data-start参数一键更改+10
按钮事件代码(JS)
startupList.elem.find('.btn-load-more').off('.click').click(function(){
startupList.elem.find('.btn-load-more').addClass('hide');
Global.loading();
startupList.loadMoreIsClicked = true;
var start = $(this).attr('data-start')*1;
start += startupList.count;
$(this).attr('data-start', start);
startupList.searchAndFilterResult(start, startupList.getFormData("#startup_search"), false);
我的python代码:
def __init__(self):
opp = Options()
opp.add_argument('--blink-settings=imagesEnabled=false')
opp.add_argument('--headless')
self.driver = webdriver.Chrome('./chromedriver', chrome_options=opp)
def parse(self, e27_url = "https://e27.co/startups/"):
self.driver.get(e27_url)
time.sleep(3)
run_check, prev_value_list = True, [0, 0]
button = self.driver.find_element_by_xpath("//button[@class='button btn-load-more']")
while run_check:
quantity_of_loaded_starttups = len(self.driver.find_elements_by_xpath(
"//div[@class='startup-block startup-list-item']"))
print('Loading, {} startups loaded'.format(quantity_of_loaded_starttups))
prev_value_list.append(quantity_of_loaded_starttups)
timer = 0
while (not button.is_displayed()):
time.sleep(0.1)
timer +=0.1
print(timer)
if timer == 60:
run_check = False
break
button.click()
if prev_value_list[-2] == prev_value_list[-1] and prev_value_list[-3] == prev_value_list[-1]:
run_check = False
company_names, e_urls, = [], []
for item in self.driver.find_elements_by_xpath("//div[@class='startup-block startup-list-item']"):
name = item.find_element_by_css_selector('.company-name').text
e27url = item.find_element_by_css_selector(".startuplink").get_attribute("href")
yield {"Startup":name,"Url":e27url}
你可以去e27.co/startups自己去看看
谢谢,
哇
您可以通过按 加载更多 按钮找到接收请求的位置,从而直接访问他们的 API。在这种情况下,请求是从以下 URL 接收的。
https://e27.co/api/startups/?tab_name=recentlyupdated&start=10&length=10
因此,通过对 length
和 start
进行一些修改,您可以获得更多 URL。我写了一个简单的脚本来获取初创公司的名称。
import requests
start_number = 0
r = requests.get('https://e27.co/api/startups/?tab_name=recentlyupdated&start={}&length=100'.format(start_number))
r = r.json()
for i in r['data']['list']:
print(i['name'])
#outputs
RESYNC Technologies
Swizzle
Sports365
ShopClues
Symantec
SpoonJoy
SEOPRO India
Solarium
SHOPLINE
Structo
Coc Coc
CarDekho
Chillr
Culture Machine
CoAssets
CoinMKT
CimplyFive
Call Levels
CereBrahm Innovations
CouponzGuru
Aisle
adMingle
AppsFlyer
AppVirality
Ambient Digital
Airtel
Apptopia
Latize
Lefora
LINC 360
LogisticsIndonesia
LogicGateOne Corporation
Livspace
LivePhuket
LINE Ventures
National Tiles-Sydney
National Tiles-Brisbane
National Research Foundation
National Tiles
National Tiles-Adelaide
National University of Singapore School of Computing
National Tiles-Wagga Wagga
National Tiles-Springwood
National Tiles-Burleigh Heads
Nationkart
Natasha
Naturally Yours
Native5
Nativfy
NaturalMantra
Native Tongue
NewsHunt
Nimble Wireless
Nanarokom.com
NoBroker
News Corp
Naxos International
NecesCity
NextGen
Notey
Naspers Group
NAM TRIP TRAVEL
Navigat Group
Nanosatisfi
Naaptol
Single Thailand
sinhasoft
Sinergy
Singsys Pte. Ltd.
Simplilearn
SIFS India
Simprosys InfoMedia
SimiCommerce
SingPost
Singapore Press Holdings
SimplerCloud
SingSaver
Sinoze
Singapore infocomm Technology Federation
Native Tech
Novelship
AthenaDesk
ZERO BrandCard™
Open24.vn
iMyanmarHouse
Shufti Pro
MobME Wireless
Moolya Testing
Mofang Gongyu
Moff Inc.
Moonfrog Labs
myNoticePeriod
MaGIC
Momoe
Manthan
Metaps
Motorola Solutions
MatchMove
Mondano
MOL- Money Online
我想从 https://e27.co/startups/ 中抓取所有初创公司的名称。 你可以看到默认有 20 个 startups 名称,要加载更多你需要点击 "Load more" 按钮。此按钮加载 10 个启动名称
我创建了 python 脚本,单击 "Load More" 按钮直到加载所有 (29000) 个启动项。它需要大量的时间和内存。 如何在不单击的情况下加载此数据?
我听到 AJAX 请求调用了一些东西,但我不明白如何实现它。
Html 按钮代码:
<button class="button btn-load-more" data-start="0">Load More</button>
data-start参数一键更改+10
按钮事件代码(JS)
startupList.elem.find('.btn-load-more').off('.click').click(function(){
startupList.elem.find('.btn-load-more').addClass('hide');
Global.loading();
startupList.loadMoreIsClicked = true;
var start = $(this).attr('data-start')*1;
start += startupList.count;
$(this).attr('data-start', start);
startupList.searchAndFilterResult(start, startupList.getFormData("#startup_search"), false);
我的python代码:
def __init__(self):
opp = Options()
opp.add_argument('--blink-settings=imagesEnabled=false')
opp.add_argument('--headless')
self.driver = webdriver.Chrome('./chromedriver', chrome_options=opp)
def parse(self, e27_url = "https://e27.co/startups/"):
self.driver.get(e27_url)
time.sleep(3)
run_check, prev_value_list = True, [0, 0]
button = self.driver.find_element_by_xpath("//button[@class='button btn-load-more']")
while run_check:
quantity_of_loaded_starttups = len(self.driver.find_elements_by_xpath(
"//div[@class='startup-block startup-list-item']"))
print('Loading, {} startups loaded'.format(quantity_of_loaded_starttups))
prev_value_list.append(quantity_of_loaded_starttups)
timer = 0
while (not button.is_displayed()):
time.sleep(0.1)
timer +=0.1
print(timer)
if timer == 60:
run_check = False
break
button.click()
if prev_value_list[-2] == prev_value_list[-1] and prev_value_list[-3] == prev_value_list[-1]:
run_check = False
company_names, e_urls, = [], []
for item in self.driver.find_elements_by_xpath("//div[@class='startup-block startup-list-item']"):
name = item.find_element_by_css_selector('.company-name').text
e27url = item.find_element_by_css_selector(".startuplink").get_attribute("href")
yield {"Startup":name,"Url":e27url}
你可以去e27.co/startups自己去看看
谢谢, 哇
您可以通过按 加载更多 按钮找到接收请求的位置,从而直接访问他们的 API。在这种情况下,请求是从以下 URL 接收的。
https://e27.co/api/startups/?tab_name=recentlyupdated&start=10&length=10
因此,通过对 length
和 start
进行一些修改,您可以获得更多 URL。我写了一个简单的脚本来获取初创公司的名称。
import requests
start_number = 0
r = requests.get('https://e27.co/api/startups/?tab_name=recentlyupdated&start={}&length=100'.format(start_number))
r = r.json()
for i in r['data']['list']:
print(i['name'])
#outputs
RESYNC Technologies
Swizzle
Sports365
ShopClues
Symantec
SpoonJoy
SEOPRO India
Solarium
SHOPLINE
Structo
Coc Coc
CarDekho
Chillr
Culture Machine
CoAssets
CoinMKT
CimplyFive
Call Levels
CereBrahm Innovations
CouponzGuru
Aisle
adMingle
AppsFlyer
AppVirality
Ambient Digital
Airtel
Apptopia
Latize
Lefora
LINC 360
LogisticsIndonesia
LogicGateOne Corporation
Livspace
LivePhuket
LINE Ventures
National Tiles-Sydney
National Tiles-Brisbane
National Research Foundation
National Tiles
National Tiles-Adelaide
National University of Singapore School of Computing
National Tiles-Wagga Wagga
National Tiles-Springwood
National Tiles-Burleigh Heads
Nationkart
Natasha
Naturally Yours
Native5
Nativfy
NaturalMantra
Native Tongue
NewsHunt
Nimble Wireless
Nanarokom.com
NoBroker
News Corp
Naxos International
NecesCity
NextGen
Notey
Naspers Group
NAM TRIP TRAVEL
Navigat Group
Nanosatisfi
Naaptol
Single Thailand
sinhasoft
Sinergy
Singsys Pte. Ltd.
Simplilearn
SIFS India
Simprosys InfoMedia
SimiCommerce
SingPost
Singapore Press Holdings
SimplerCloud
SingSaver
Sinoze
Singapore infocomm Technology Federation
Native Tech
Novelship
AthenaDesk
ZERO BrandCard™
Open24.vn
iMyanmarHouse
Shufti Pro
MobME Wireless
Moolya Testing
Mofang Gongyu
Moff Inc.
Moonfrog Labs
myNoticePeriod
MaGIC
Momoe
Manthan
Metaps
Motorola Solutions
MatchMove
Mondano
MOL- Money Online