由 colab 抓取的 Youtube
Yoututbe scraping by colab
我需要通过 Google Colab 中的 this list 等标签从 YouTube 抓取汽车类型视频:
Abarth
AC
Acura
Adam
Adler
AEC
Aero
Aixam
Albion
所以我已经尝试使用这两个代码在 google colab 中找到视频标签(例如标签='Peugeot'):
!pip install youtube-search-python
from youtubesearchpython import SearchVideos
search = SearchVideos("NoCopyrightSounds", offset = 1, mode = "json", max_results = 20)
print(search.result())
和
!pip install youtube-dl
!echo '' > ford_video_list.txt
!chmod 755 ford_video_list.txt
!youtube-dl --match-title 'ford' --add-metadata --write-thumbnail --list-thumbnails --mark-watched --write-info-json 'ford_video_description_json.txt' --write-description 'ford_video_description.txt' --cookies='Search-youtube-url-file.txt' --ignore-errors --skip-download --get-url -f bestvideo+bestaudio/best --default-search "ytsearch2000:" "Ford Festiva" >> ford_video_list.txt
!echo '*****End of test 1 ******'
但是通过尝试此代码,它没有显示任何结果:
import urllib.request
from bs4 import BeautifulSoup
textToSearch = 'python tutorials'
query = urllib.parse.quote(textToSearch)
url = "https://www.youtube.com/results?search_query=" + query
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
if not vid['href'].startswith("https://googleads.g.doubleclick.net/"):
print('https://www.youtube.com' + vid['href'])
所以,我猜class名称不正确!,我在这里要求调试它。
更新:
我制作了一个 google colab 页面(如下所示)来测试这些代码(还有显示此错误的 youtube-dl 代码:
https://colab.research.google.com/drive/1bZQ68gLggTQHCG_5fQQJJTICHA4K3HJ3?usp=sharing
ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests
(caused by <HTTPError 429: 'Too Many Requests'>); please report this
issue on https://yt-dl.org/bug . Make sure you are using the latest
version; see https://yt-dl.org/update on how to update. Be sure to
call youtube-dl with the --verbose flag and include its complete
output.
我理解错误是因为:
google不喜欢一个IP地址有太多请求。
因此尝试将这些标签 (--rm-cache-dir --force-ipv4 --verbose
) 添加到 youtube-dl
命令,如下所示(基于这些引用 1 2 3):
ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by <HTTPError 429: 'Too Many Requests'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "/usr/local/lib/python3.6/dist-packages/youtube_dl/extractor/common.py", line 632, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "/usr/local/lib/python3.6/dist-packages/youtube_dl/YoutubeDL.py", line 2238, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 564, in error
result = self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 756, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
谢谢。
它一直在通过更改“ytsearch2000:”“Ford Festiva”` 来工作:
到 ' "ytsearch50":"Ford Festiva" 如下所示:
!pip install youtube-dl
# !youtube-dl --default-search gvsearch5:how to develop for android --no-playlist --write-info-json --write-annotation --write-thumbnail --write-sub --skip-download
!youtube-dl --match-title 'ford' "ytsearch50":"Ford Festiva"+"peugeot 405" --write-info-json --write-annotation --write-thumbnail --write-sub -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4'
问题是因为 :
和 "
位置错误!
从 google 抓取一些汽车类型视频的完整代码可以在这里看到:
https://colab.research.google.com/github/CAR-Driving/yoloOnGoogleColab/blob/master/database_creating/Yoututbe_scraping_by_colab.ipynb
我需要通过 Google Colab 中的 this list 等标签从 YouTube 抓取汽车类型视频:
Abarth
AC
Acura
Adam
Adler
AEC
Aero
Aixam
Albion
所以我已经尝试使用这两个代码在 google colab 中找到视频标签(例如标签='Peugeot'):
!pip install youtube-search-python
from youtubesearchpython import SearchVideos
search = SearchVideos("NoCopyrightSounds", offset = 1, mode = "json", max_results = 20)
print(search.result())
和
!pip install youtube-dl
!echo '' > ford_video_list.txt
!chmod 755 ford_video_list.txt
!youtube-dl --match-title 'ford' --add-metadata --write-thumbnail --list-thumbnails --mark-watched --write-info-json 'ford_video_description_json.txt' --write-description 'ford_video_description.txt' --cookies='Search-youtube-url-file.txt' --ignore-errors --skip-download --get-url -f bestvideo+bestaudio/best --default-search "ytsearch2000:" "Ford Festiva" >> ford_video_list.txt
!echo '*****End of test 1 ******'
但是通过尝试此代码,它没有显示任何结果:
import urllib.request
from bs4 import BeautifulSoup
textToSearch = 'python tutorials'
query = urllib.parse.quote(textToSearch)
url = "https://www.youtube.com/results?search_query=" + query
response = urllib.request.urlopen(url)
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
for vid in soup.findAll(attrs={'class':'yt-uix-tile-link'}):
if not vid['href'].startswith("https://googleads.g.doubleclick.net/"):
print('https://www.youtube.com' + vid['href'])
所以,我猜class名称不正确!,我在这里要求调试它。
更新: 我制作了一个 google colab 页面(如下所示)来测试这些代码(还有显示此错误的 youtube-dl 代码:
https://colab.research.google.com/drive/1bZQ68gLggTQHCG_5fQQJJTICHA4K3HJ3?usp=sharing
ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by <HTTPError 429: 'Too Many Requests'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
我理解错误是因为:
google不喜欢一个IP地址有太多请求。
因此尝试将这些标签 (--rm-cache-dir --force-ipv4 --verbose
) 添加到 youtube-dl
命令,如下所示(基于这些引用 1 2 3):
ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by <HTTPError 429: 'Too Many Requests'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "/usr/local/lib/python3.6/dist-packages/youtube_dl/extractor/common.py", line 632, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "/usr/local/lib/python3.6/dist-packages/youtube_dl/YoutubeDL.py", line 2238, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 564, in error
result = self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 756, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
谢谢。
它一直在通过更改“ytsearch2000:”“Ford Festiva”` 来工作:
到 ' "ytsearch50":"Ford Festiva" 如下所示:
!pip install youtube-dl
# !youtube-dl --default-search gvsearch5:how to develop for android --no-playlist --write-info-json --write-annotation --write-thumbnail --write-sub --skip-download
!youtube-dl --match-title 'ford' "ytsearch50":"Ford Festiva"+"peugeot 405" --write-info-json --write-annotation --write-thumbnail --write-sub -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/mp4'
问题是因为 :
和 "
位置错误!
从 google 抓取一些汽车类型视频的完整代码可以在这里看到: https://colab.research.google.com/github/CAR-Driving/yoloOnGoogleColab/blob/master/database_creating/Yoututbe_scraping_by_colab.ipynb