如何将我的输出分成单独的 youtube url 并放入 python3 中的列表?
How to divide my output to individual youtube url's and put to the list in python3?
我想从其他网站制作简单的 youtube 频道的抓取工具,以创建由位于 youtube 网站上的链接组成的联系人列表。
使用的库:beautifulsoup 和请求。
我在仅提取 url 作为单个对象并将其放入列表时遇到了一些问题。
这是我在 "hello world" 之后的第一个节目 python 所以我还是个新手。
我不知道下一步该做什么
#----------------------------------------------------
#Libs
#----------------------------------------------------
from bs4 import BeautifulSoup
import requests
#----------------------------------------------------
#variables
#----------------------------------------------------
page = ('http://ranking.vstars.pl/?side=96&&sort=month_change')
#----------------------------------------------------
#functions
#----------------------------------------------------
def scraper():
x=0
target = requests.get(page)
soup = BeautifulSoup(target.text, 'html.parser')
for links in soup.find_all("td", "a", class_= "href"):
print(links, '\n')
x += 1
print ("Number of links:" , x)
#----------------------------------------------------
#codes
#----------------------------------------------------
scraper()
Output:
<td class="href"><a href="https://www.youtube.com/channel/UCq-EgxhHVTFWVZcjFwsfnWA" rel="nofollow" target="_blank">YouTube</a></td>
...
<td class="href"><a href="https://www.youtube.com/channel/UCpcG5MwAks-At2L-gbSppag" rel="nofollow" target="_blank">YouTube</a></td>
Number of links: 81
因为你想要一个列表类型的输出,我冒昧地将它存储在一个列表中:
代码
#----------------------------------------------------
#Libs
#----------------------------------------------------
from bs4 import BeautifulSoup
import requests
#----------------------------------------------------
#variables
#----------------------------------------------------
page = ('http://ranking.vstars.pl/?side=96&&sort=month_change')
#----------------------------------------------------
#functions
#----------------------------------------------------
def scraper():
x=0
target = requests.get(page)
soup = BeautifulSoup(target.text, 'html.parser')
all_links = []
for links in soup.find_all("td", "a", class_= "href"):
all_links.append(links.contents[0].attrs['href'])
x += 1
print(all_links)
print ("Number of links:" , x)
#----------------------------------------------------
#codes
#----------------------------------------------------
scraper()
输出
[u'https://www.youtube.com/channel/UCq-EgxhHVTFWVZcjFwsfnWA', u'https://www.youtube.com/channel/UCPf-3giVvdU55kIBN2CbLRQ', ... ]
('Number of links:', 81)
更改函数:
def scraper():
x=0
target = requests.get(page)
soup = BeautifulSoup(target.text, 'html.parser')
for td in soup.find_all("td", class_= "href"):
for links in td.find_all("a"):
print(links['href'], '\n')
x += 1
print ("Number of links:" , x)
试试这个:
import re
import urllib.request
from bs4 import BeautifulSoup
def getLinks(url):
x = 0
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page, 'html.parser')
links = []
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
links.append(link.get('href'))
x = x + 1
print(links,x)
return links
getLinks("http://google.com")
我想从其他网站制作简单的 youtube 频道的抓取工具,以创建由位于 youtube 网站上的链接组成的联系人列表。
使用的库:beautifulsoup 和请求。
我在仅提取 url 作为单个对象并将其放入列表时遇到了一些问题。
这是我在 "hello world" 之后的第一个节目 python 所以我还是个新手。
我不知道下一步该做什么
#----------------------------------------------------
#Libs
#----------------------------------------------------
from bs4 import BeautifulSoup
import requests
#----------------------------------------------------
#variables
#----------------------------------------------------
page = ('http://ranking.vstars.pl/?side=96&&sort=month_change')
#----------------------------------------------------
#functions
#----------------------------------------------------
def scraper():
x=0
target = requests.get(page)
soup = BeautifulSoup(target.text, 'html.parser')
for links in soup.find_all("td", "a", class_= "href"):
print(links, '\n')
x += 1
print ("Number of links:" , x)
#----------------------------------------------------
#codes
#----------------------------------------------------
scraper()
Output:
<td class="href"><a href="https://www.youtube.com/channel/UCq-EgxhHVTFWVZcjFwsfnWA" rel="nofollow" target="_blank">YouTube</a></td>
...
<td class="href"><a href="https://www.youtube.com/channel/UCpcG5MwAks-At2L-gbSppag" rel="nofollow" target="_blank">YouTube</a></td>
Number of links: 81
因为你想要一个列表类型的输出,我冒昧地将它存储在一个列表中:
代码
#----------------------------------------------------
#Libs
#----------------------------------------------------
from bs4 import BeautifulSoup
import requests
#----------------------------------------------------
#variables
#----------------------------------------------------
page = ('http://ranking.vstars.pl/?side=96&&sort=month_change')
#----------------------------------------------------
#functions
#----------------------------------------------------
def scraper():
x=0
target = requests.get(page)
soup = BeautifulSoup(target.text, 'html.parser')
all_links = []
for links in soup.find_all("td", "a", class_= "href"):
all_links.append(links.contents[0].attrs['href'])
x += 1
print(all_links)
print ("Number of links:" , x)
#----------------------------------------------------
#codes
#----------------------------------------------------
scraper()
输出
[u'https://www.youtube.com/channel/UCq-EgxhHVTFWVZcjFwsfnWA', u'https://www.youtube.com/channel/UCPf-3giVvdU55kIBN2CbLRQ', ... ]
('Number of links:', 81)
更改函数:
def scraper():
x=0
target = requests.get(page)
soup = BeautifulSoup(target.text, 'html.parser')
for td in soup.find_all("td", class_= "href"):
for links in td.find_all("a"):
print(links['href'], '\n')
x += 1
print ("Number of links:" , x)
试试这个:
import re
import urllib.request
from bs4 import BeautifulSoup
def getLinks(url):
x = 0
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page, 'html.parser')
links = []
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
links.append(link.get('href'))
x = x + 1
print(links,x)
return links
getLinks("http://google.com")