如何在可以添加到 url 的循环中输入单词列表以获得结果

Question

我将 url 作为输入：url = "https://www.amazon.in/s?k=headphones&page=1" 这工作正常但在第 19 页停止我不想在第 19 页中断，而是想将下一个输入作为“https://www.amazon.in/s?k=" +

“演讲嘉宾&页面=1”
“耳塞&页面=1” 依此类推运行循环

from bs4 import BeautifulSoup as soup
import pandas as pd
import requests


data =[]

def getdata (url):
    header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' } 
    req = urllib.request.Request(url, headers=header)
    amazon_html = urllib.request.urlopen(req).read()
    a_soup = soup(amazon_html,'html.parser')
    
    for e in a_soup.select('div[data-component-type="s-search-result"]'):
        try:
            title = e.find('h2').text
        except:
            title = None
            
        data.append({
            'title':title
        })
        
    return a_soup

def getnextpage(a_soup):
  page= a_soup.find('a',attrs={"class": 's-pagination-item s-pagination-next s-pagination-button s-pagination-separator'})
  page = page['href']
  url =  'http://www.amazon.in'+ str(page)
  return url
            
while True:
  geturl = getdata(url)
  url = getnextpage(geturl)
    
  if not url:
    break
  print(url)```


```output = pd.DataFrame(data)
output

这段代码返回了正确的结果，但我每次都希望它输入一个可以添加到 url 末尾的项目列表，而不是我给出一个新的 url一次一个获取可以添加到 DataFrame 的结果注：搜索结果停在第19页

Answer 1

为您的关键字制作一个列表，对其进行迭代并将 while 循环包含在每次迭代中。

keywords = ['speakers','earbuds']

for k in keywords:
    url = 'https://www.amazon.in/s?k='+k
    while True:
        geturl = getdata(url)
        url = getnextpage(geturl)

        if not url:
            break
        print(url)

请注意，亚马逊不喜欢这种对其页面的自动访问，并且可以很快识别访问模式。为了稍微降低请求的频率，您至少应该包括一些延迟 time.sleep()。当然，如果用官方的api.

就更好了

例子

from bs4 import BeautifulSoup as soup
import pandas as pd
import requests
import urllib

data =[]

def getdata (url):
    header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' } 
    req = urllib.request.Request(url, headers=header)
    amazon_html = urllib.request.urlopen(req).read()
    a_soup = soup(amazon_html,'html.parser')
    
    for e in a_soup.select('div[data-component-type="s-search-result"]'):
        try:
            title = e.find('h2').text
        except:
            title = None
            
        data.append({
            'title':title,
            'url':'http://www.amazon.in' + e.h2.a['href']
        })
        
    return a_soup

def getnextpage(a_soup):
    try:
        page = a_soup.find('a',attrs={"class": 's-pagination-item s-pagination-next s-pagination-button s-pagination-separator'})['href']
        url =  'http://www.amazon.in'+ str(page)
    except:
        url = None
    return url


keywords = ['speakers','earbuds']

for k in keywords:
    url = 'https://www.amazon.in/s?k='+k
    while True:
        geturl = getdata(url)
        url = getnextpage(geturl)

        if not url:
            break
        print(url)

输出（打印）

http://www.amazon.in/s?k=speakers&page=2&qid=1649420352&ref=sr_pg_1
...
http://www.amazon.in/s?k=speakers&page=20&qid=1649420373&ref=sr_pg_19
http://www.amazon.in/s?k=earbuds&page=2&qid=1649420375&ref=sr_pg_1
...
http://www.amazon.in/s?k=earbuds&page=20&qid=1649420394&ref=sr_pg_19

输出（`pd.DataFrame(data)`）

	title	url
0	Echo Dot (3rd Gen) - #1 smart speaker brand in India with Alexa (Black)	http://www.amazon.in/gp/bestsellers/electronics/15765862031/ref=sr_bs_0_15765862031_1
1	TimbreSonic Rhythm Speaker Wired Karaoke Ultimate Sound Party Portable Speaker	http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_aps_sr_pg1_1?ie=UTF8&adId=A01688993VZM1IH2U6JB5&url=%2FTimbreSonic-Speaker-Karaoke-Ultimate-Portable%2Fdp%2FB096M2T346%2Fref%3Dsr_1_2_sspa%3Fkeywords%3Dspeakers%26qid%3D1649421227%26sr%3D8-2-spons%26psc%3D1%26smid%3DAK0P65LCJ5QQN&qualifier=1649421227&id=2899208110237385&widgetName=sp_atf
2	boAt Stone 180 5W Bluetooth Speaker with Upto 10 Hours Playback, 1.75" Driver, IPX7 and TWS Feature(Black)	http://www.amazon.in/boAt-Stone-Bluetooth-Speaker-Black/dp/B08JMC1988/ref=ice_ac_b_dpb?keywords=speakers&qid=1649421227&sr=8-3
3	Speaker	http://www.amazon.in/Generic-Speaker/dp/B09X5M77MZ/ref=sr_1_omk_4?keywords=speakers&qid=1649421227&sr=8-4
4	Zebronics Zeb-Warrior 2.0 Multimedia Speaker with Aux Connectivity,USB Powered and Volume Control	http://www.amazon.in/gp/bestsellers/computers/1375442031/ref=sr_bs_4_1375442031_1
...	...	...
847	Zebronics Zeb-Sound Bomb 5 TWS Earbuds with Bluetooth v5.0, up to 22H Backup, Flash Connect, Splash Proof, Voice Assistant, Touch Control, 10mm Driver, Built in Microphone and Type C(Black)	http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_mtf_aps_sr_pg20_1?ie=UTF8&adId=A09061362IHFGLF39FZ4K&url=%2FZebronics-Zeb-Sound-Bluetooth-Assistant-Microphone%2Fdp%2FB09NNNLBVD%2Fref%3Dsr_1_308_sspa%3Fkeywords%3Dearbuds%26qid%3D1649420939%26sr%3D8-308-spons%26psc%3D1&qualifier=1649420939&id=2014190349292195&widgetName=sp_mtf
848	boAt Airdopes 141 True Wireless Earbuds with 42H Playtime, Beast Mode(Low Latency Upto 80ms) for Gaming, ENx Tech, ASAP Charge, IWP, IPX4 Water Resistance, Smooth Touch Controls(Bold Black)	http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_mtf_aps_sr_pg20_1?ie=UTF8&adId=A08646093S9SKZXE3VDX4&url=%2FboAt-Airdopes-141-Wireless-Resistance%2Fdp%2FB09N3ZNHTY%2Fref%3Dsr_1_309_sspa%3Fkeywords%3Dearbuds%26qid%3D1649420939%26sr%3D8-309-spons%26psc%3D1&qualifier=1649420939&id=2014190349292195&widgetName=sp_mtf
849	Skyfly Xbot GE100 Wired in Ear Earphones with Mic (Black)	http://www.amazon.in/Skyfly-Xbot-Gaming-Earphones-Detachable/dp/B07ZYR78B3/ref=sr_1_310?keywords=earbuds&qid=1649420939&sr=8-310
850	JBL C115 TWS, True Wireless Earbuds with Mic, Jumbo 21 Hours Playtime with Quick Charge, True Bass, Dual Connect, Bluetooth 5.0, Type C & Voice Assistant Support for Mobile Phones (Black)	http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg20_1?ie=UTF8&adId=A0791293Y8WP49FN4EZU&url=%2FJBL-Wireless-Bluetooth-Assistance-Integration%2Fdp%2FB08L5ZC8R3%2Fref%3Dsr_1_311_sspa%3Fkeywords%3Dearbuds%26qid%3D1649420939%26smid%3DA14CZOWI0VEHLG%26sr%3D8-311-spons%26psc%3D1&qualifier=1649420939&id=2014190349292195&widgetName=sp_btf
851	Crossbeats Airpop Bluetooth Truly Wireless In Ear Earbuds With Mic, with 30Hrs Playtime Ultralight Bluetooth Earphone with Mic & Voice Assistant, Passive Noise Cancelling Headset, Type-C Fasting Charging - Blue	http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg20_1?ie=UTF8&adId=A10368023R9B7RAUU82SP&url=%2FCrossbeats-Bluetooth-Ultralight-Assistant-Cancelling%2Fdp%2FB09PDSVQTW%2Fref%3Dsr_1_312_sspa%3Fkeywords%3Dearbuds%26qid%3D1649420939%26sr%3D8-312-spons%26psc%3D1&qualifier=1649420939&id=2014190349292195&widgetName=sp_btf

如何在可以添加到 url 的循环中输入单词列表以获得结果

How can I input the list of words in a loop that can be added to the url to get the results

html

python

loops

beautifulsoup

web-scraping

例子

输出（打印）

输出（`pd.DataFrame(data)`）

如何在可以添加到 url 的循环中输入单词列表以获得结果

How can I input the list of words in a loop that can be added to the url to get the results

html

python

loops

beautifulsoup

web-scraping

例子

输出（打印）

输出（pd.DataFrame(data)）

输出（`pd.DataFrame(data)`）