如何将我的输出分成单独的 youtube url 并放入 python3 中的列表?

How to divide my output to individual youtube url's and put to the list in python3?

我想从其他网站制作简单的 youtube 频道的抓取工具,以创建由位于 youtube 网站上的链接组成的联系人列表。

使用的库:beautifulsoup 和请求。

我在仅提取 url 作为单个对象并将其放入列表时遇到了一些问题。

这是我在 "hello world" 之后的第一个节目 python 所以我还是个新手。

我不知道下一步该做什么

#----------------------------------------------------
#Libs
#----------------------------------------------------
from bs4 import BeautifulSoup
import requests

#----------------------------------------------------
#variables
#----------------------------------------------------
page = ('http://ranking.vstars.pl/?side=96&&sort=month_change')

#----------------------------------------------------                 
#functions
#----------------------------------------------------
def scraper():

    x=0

    target = requests.get(page)
    soup = BeautifulSoup(target.text, 'html.parser')

    for links in soup.find_all("td", "a", class_= "href"):
        print(links, '\n')
        x += 1

    print ("Number of links:" , x)

#----------------------------------------------------  
#codes
#----------------------------------------------------
scraper()
Output:

<td class="href"><a href="https://www.youtube.com/channel/UCq-EgxhHVTFWVZcjFwsfnWA" rel="nofollow" target="_blank">YouTube</a></td> 

...

<td class="href"><a href="https://www.youtube.com/channel/UCpcG5MwAks-At2L-gbSppag" rel="nofollow" target="_blank">YouTube</a></td> 

Number of links: 81

因为你想要一个列表类型的输出,我冒昧地将它存储在一个列表中:

代码

#----------------------------------------------------
#Libs
#----------------------------------------------------
from bs4 import BeautifulSoup
import requests

#----------------------------------------------------
#variables
#----------------------------------------------------
page = ('http://ranking.vstars.pl/?side=96&&sort=month_change')

#----------------------------------------------------
#functions
#----------------------------------------------------
def scraper():

    x=0

    target = requests.get(page)
    soup = BeautifulSoup(target.text, 'html.parser')

    all_links = []
    for links in soup.find_all("td", "a", class_= "href"):
        all_links.append(links.contents[0].attrs['href'])
        x += 1

    print(all_links)
    print ("Number of links:" , x)

#----------------------------------------------------
#codes
#----------------------------------------------------
scraper()

输出

[u'https://www.youtube.com/channel/UCq-EgxhHVTFWVZcjFwsfnWA', u'https://www.youtube.com/channel/UCPf-3giVvdU55kIBN2CbLRQ', ... ]

('Number of links:', 81)

更改函数:

def scraper():
    x=0
    target = requests.get(page)
    soup = BeautifulSoup(target.text, 'html.parser')

    for td in soup.find_all("td", class_= "href"):
        for links in td.find_all("a"):
            print(links['href'], '\n')
            x += 1

    print ("Number of links:" , x)

试试这个:

import re
import urllib.request

from bs4 import BeautifulSoup


def getLinks(url):
   x = 0
   html_page = urllib.request.urlopen(url)
   soup = BeautifulSoup(html_page, 'html.parser')
   links = []

   for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
      links.append(link.get('href'))
      x = x + 1

     print(links,x)

return links

getLinks("http://google.com")