从特定频道抓取 YouTube 视频并搜索?

scrape YouTube video from a specific channel and search?

我正在使用此代码获取 YouTube 频道的 url 它工作正常,但我想添加一个选项来搜索频道内具有特定标题的视频。并获取您使用搜索短语

找到的第一个视频的 url
from bs4 import BeautifulSoup
import requests

url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")

for entry in soup.find_all("entry"):
    for link in entry.find_all("link"):
        print(link["href"])

这是一个很好的方法,但是使用像 youtube-dl 这样的工具,您将有更多的影响力。尝试像 youtube-dl "ytsearchall:intitle:'hello world'" --dump-json --flat-playlist 这样的东西。 youtube-dl 具有大量功能,只需很少修改或无需修改即可满足您所有的视频抓取需求。

就实施您自己的搜索而言,基础知识非常简单,但可能无法为您提供所需的体验。您可能想要收集标题,可能将其收集到具有 URL 值的字典中,然后您需要遍历搜索文本的键。以这种方式精确匹配关键字并不难,但它也可能不是您所期望的,因为大多数搜索引擎使用很多标准来为您提供所需的内容。

喜欢这位朋友:

from bs4 import BeautifulSoup
from lxml import etree
import urllib
import requests

url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")

video_titles =[]

print("Cashing Video Titles ...")
for entry in soup.find_all("entry"):
    for link in entry.find_all("link"):
        youtube = etree.HTML(urllib.request.urlopen(link["href"]).read()) 
        video_title = youtube.xpath("//span[@id='eow-title']/@title") 
        if len(video_title)>0:
            video_titles.append({"title":video_title[0], "url":link.attrs["href"]})
            print(len(video_titles), ":", video_title[0])

print("Cashing Video Titles Done!")


keyword = input("Enter the keyword you wanna search:")
for video in video_titles:
    if video["title"].__contains__(keyword):
        print(video["url"])

输出:

Cashing Video Titles...
1: The ,000 Mac Pro Killer
2: Sony PlayStation - by Alienware - WAN Show June 12, 2020
3: Experimental 120FPS Game Streaming!
4: We Edited This Video on an iPad Pro!
5: The Tiniest Gaming Laptop!
6: I spent two days in my attic to avoid a camera subscription!
7: Stolen iPhones Rat Out New "Owners" - WAN Show June 5, 2020
8: We got the GPU AMD wouldnât sellâ¦
9: Will More RAM Make your PC Faster?? (2020)
Cashing Video Titles Done
Enter the keyword you wanna search: Mac
https://www.youtube.com/watch?v=l_IHSRPVqwQ

在我的最后一个回答中,您在给定的 youtube 频道中获得了所有视频标题,如您所寻找的 但是在我们之间的评论中,你告诉我你想通过 cronjob 运行 脚本,这需要更多的努力,所以我添加另一个答案。

from bs4 import BeautifulSoup
from lxml import etree
import urllib
import requests
import sys

def fetch_titles(url):
    video_titles = []
    html = requests.get(url)
    soup = BeautifulSoup(html.text, "lxml")
    for entry in soup.find_all("entry"):
        for link in entry.find_all("link"):
            youtube = etree.HTML(urllib.request.urlopen(link["href"]).read()) 
            video_title = youtube.xpath("//span[@id='eow-title']/@title") 
            if len(video_title)>0:
                video_titles.append({"title":video_title[0], "url":link.attrs["href"]})
    return video_titles

def main():
    if sys.argv.__len__() == 1:
        print("Error: You should specifying keyword")
        print("eg: python3 ./main.py KEYWORD")
        return

    url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
    keyword = sys.argv[1]

    video_titles = fetch_titles(url)
    for video in video_titles:
        if video["title"].__contains__(keyword):
            print(video["url"])
            break # add this line, if you want to print the first match only


if __name__ == "__main__":
    main()

当您通过终端调用脚本时,您应该指定关键字,如下所示:

$ python3 ./main.py Mac

其中 Mac 是关键字,main.py 是 python 脚本文件名

输出:

https://www.youtube.com/watch?v=l_IHSRPVqwQ