从特定频道抓取 YouTube 视频并搜索?
scrape YouTube video from a specific channel and search?
我正在使用此代码获取 YouTube 频道的 url 它工作正常,但我想添加一个选项来搜索频道内具有特定标题的视频。并获取您使用搜索短语
找到的第一个视频的 url
from bs4 import BeautifulSoup
import requests
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
for entry in soup.find_all("entry"):
for link in entry.find_all("link"):
print(link["href"])
这是一个很好的方法,但是使用像 youtube-dl
这样的工具,您将有更多的影响力。尝试像 youtube-dl "ytsearchall:intitle:'hello world'" --dump-json --flat-playlist
这样的东西。 youtube-dl
具有大量功能,只需很少修改或无需修改即可满足您所有的视频抓取需求。
就实施您自己的搜索而言,基础知识非常简单,但可能无法为您提供所需的体验。您可能想要收集标题,可能将其收集到具有 URL 值的字典中,然后您需要遍历搜索文本的键。以这种方式精确匹配关键字并不难,但它也可能不是您所期望的,因为大多数搜索引擎使用很多标准来为您提供所需的内容。
喜欢这位朋友:
from bs4 import BeautifulSoup
from lxml import etree
import urllib
import requests
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
video_titles =[]
print("Cashing Video Titles ...")
for entry in soup.find_all("entry"):
for link in entry.find_all("link"):
youtube = etree.HTML(urllib.request.urlopen(link["href"]).read())
video_title = youtube.xpath("//span[@id='eow-title']/@title")
if len(video_title)>0:
video_titles.append({"title":video_title[0], "url":link.attrs["href"]})
print(len(video_titles), ":", video_title[0])
print("Cashing Video Titles Done!")
keyword = input("Enter the keyword you wanna search:")
for video in video_titles:
if video["title"].__contains__(keyword):
print(video["url"])
输出:
Cashing Video Titles...
1: The ,000 Mac Pro Killer
2: Sony PlayStation - by Alienware - WAN Show June 12, 2020
3: Experimental 120FPS Game Streaming!
4: We Edited This Video on an iPad Pro!
5: The Tiniest Gaming Laptop!
6: I spent two days in my attic to avoid a camera subscription!
7: Stolen iPhones Rat Out New "Owners" - WAN Show June 5, 2020
8: We got the GPU AMD wouldnât sellâ¦
9: Will More RAM Make your PC Faster?? (2020)
Cashing Video Titles Done
Enter the keyword you wanna search: Mac
https://www.youtube.com/watch?v=l_IHSRPVqwQ
在我的最后一个回答中,您在给定的 youtube 频道中获得了所有视频标题,如您所寻找的
但是在我们之间的评论中,你告诉我你想通过 cronjob
运行 脚本,这需要更多的努力,所以我添加另一个答案。
from bs4 import BeautifulSoup
from lxml import etree
import urllib
import requests
import sys
def fetch_titles(url):
video_titles = []
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
for entry in soup.find_all("entry"):
for link in entry.find_all("link"):
youtube = etree.HTML(urllib.request.urlopen(link["href"]).read())
video_title = youtube.xpath("//span[@id='eow-title']/@title")
if len(video_title)>0:
video_titles.append({"title":video_title[0], "url":link.attrs["href"]})
return video_titles
def main():
if sys.argv.__len__() == 1:
print("Error: You should specifying keyword")
print("eg: python3 ./main.py KEYWORD")
return
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
keyword = sys.argv[1]
video_titles = fetch_titles(url)
for video in video_titles:
if video["title"].__contains__(keyword):
print(video["url"])
break # add this line, if you want to print the first match only
if __name__ == "__main__":
main()
当您通过终端调用脚本时,您应该指定关键字,如下所示:
$ python3 ./main.py Mac
其中 Mac
是关键字,main.py
是 python 脚本文件名
输出:
https://www.youtube.com/watch?v=l_IHSRPVqwQ
我正在使用此代码获取 YouTube 频道的 url 它工作正常,但我想添加一个选项来搜索频道内具有特定标题的视频。并获取您使用搜索短语
找到的第一个视频的 urlfrom bs4 import BeautifulSoup
import requests
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
for entry in soup.find_all("entry"):
for link in entry.find_all("link"):
print(link["href"])
这是一个很好的方法,但是使用像 youtube-dl
这样的工具,您将有更多的影响力。尝试像 youtube-dl "ytsearchall:intitle:'hello world'" --dump-json --flat-playlist
这样的东西。 youtube-dl
具有大量功能,只需很少修改或无需修改即可满足您所有的视频抓取需求。
就实施您自己的搜索而言,基础知识非常简单,但可能无法为您提供所需的体验。您可能想要收集标题,可能将其收集到具有 URL 值的字典中,然后您需要遍历搜索文本的键。以这种方式精确匹配关键字并不难,但它也可能不是您所期望的,因为大多数搜索引擎使用很多标准来为您提供所需的内容。
喜欢这位朋友:
from bs4 import BeautifulSoup
from lxml import etree
import urllib
import requests
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
video_titles =[]
print("Cashing Video Titles ...")
for entry in soup.find_all("entry"):
for link in entry.find_all("link"):
youtube = etree.HTML(urllib.request.urlopen(link["href"]).read())
video_title = youtube.xpath("//span[@id='eow-title']/@title")
if len(video_title)>0:
video_titles.append({"title":video_title[0], "url":link.attrs["href"]})
print(len(video_titles), ":", video_title[0])
print("Cashing Video Titles Done!")
keyword = input("Enter the keyword you wanna search:")
for video in video_titles:
if video["title"].__contains__(keyword):
print(video["url"])
输出:
Cashing Video Titles... 1: The ,000 Mac Pro Killer 2: Sony PlayStation - by Alienware - WAN Show June 12, 2020 3: Experimental 120FPS Game Streaming! 4: We Edited This Video on an iPad Pro! 5: The Tiniest Gaming Laptop! 6: I spent two days in my attic to avoid a camera subscription! 7: Stolen iPhones Rat Out New "Owners" - WAN Show June 5, 2020 8: We got the GPU AMD wouldnât sell⦠9: Will More RAM Make your PC Faster?? (2020) Cashing Video Titles Done Enter the keyword you wanna search: Mac https://www.youtube.com/watch?v=l_IHSRPVqwQ
在我的最后一个回答中,您在给定的 youtube 频道中获得了所有视频标题,如您所寻找的
但是在我们之间的评论中,你告诉我你想通过 cronjob
运行 脚本,这需要更多的努力,所以我添加另一个答案。
from bs4 import BeautifulSoup
from lxml import etree
import urllib
import requests
import sys
def fetch_titles(url):
video_titles = []
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
for entry in soup.find_all("entry"):
for link in entry.find_all("link"):
youtube = etree.HTML(urllib.request.urlopen(link["href"]).read())
video_title = youtube.xpath("//span[@id='eow-title']/@title")
if len(video_title)>0:
video_titles.append({"title":video_title[0], "url":link.attrs["href"]})
return video_titles
def main():
if sys.argv.__len__() == 1:
print("Error: You should specifying keyword")
print("eg: python3 ./main.py KEYWORD")
return
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
keyword = sys.argv[1]
video_titles = fetch_titles(url)
for video in video_titles:
if video["title"].__contains__(keyword):
print(video["url"])
break # add this line, if you want to print the first match only
if __name__ == "__main__":
main()
当您通过终端调用脚本时,您应该指定关键字,如下所示:
$ python3 ./main.py Mac
其中 Mac
是关键字,main.py
是 python 脚本文件名
输出:
https://www.youtube.com/watch?v=l_IHSRPVqwQ