如何首先从特定频道抓取新的 YouTube 视频?
How to scrape new YouTube videos from a specific channel first?
我有一个频道,我想尽快从中获取新上传的视频。最好的方法是什么?我知道的两个选项:
- 使用 YouTube API
- 直接通过url访问频道
对于选项 1,我需要调用 api 才能获得视频列表。由于有配额,我想我会 运行 我可以拨打 api 个电话。我认为选项 2 是最好的选择,因为我可以根据需要随时调用 url。
是否首先通过 api 提供新视频?还是通过 url 访问的视频在不同的时间提供给用户,具体取决于他们来自的地区?我自己构建了一个 url 爬虫。我每分钟访问 url。仍然有人比我早 8 分钟收到视频。我不明白为什么会这样。
您可以为感兴趣的频道尝试 RSS 提要。它包含带有 UTC 时间戳的新鲜视频(因此您提到的时区没有问题)。
频道视频的 RSS link 可以在频道页面的来源中找到。打开页面源码搜索"rssUrl":
只是为了构建一些 MadRay 写的东西,你可以用这个 URL
做一些简单的字符串替换
使用频道 ID:
"https://www.youtube.com/feeds/videos.xml?channel_id=UCXuqSBlHAE6Xw-yeJA0Tunw"
使用频道名称:
https://www.youtube.com/feeds/videos.xml?user=LinusTechTips
冒昧给大家解析一下。
from bs4 import BeautifulSoup
import requests
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
for entry in soup.find_all("entry"):
for title in entry.find_all("title"):
print(title.text)
for link in entry.find_all("link"):
print(link["href"])
for name in entry.find_all("name"):
print(name.text)
for pub in entry.find_all("published"):
print(pub.text)
回复:
FINALLY Wireless Headphones that Sound GREAT
https://www.youtube.com/watch?v=rei5vMQmD4Q
Linus Tech Tips
2020-01-30T20:04:37+00:00
Don't give Apple your MONEY - Mac Pro Upgrade Adventure
https://www.youtube.com/watch?v=zcLbSCinX3U
Linus Tech Tips
2020-01-29T19:59:56+00:00
We got the Kick-Proof TV from China!
https://www.youtube.com/watch?v=4eSADWuZskk
Linus Tech Tips
2020-01-28T19:46:09+00:00
Everything went wrong... Water Cooled 8K Camera Final Test
https://www.youtube.com/watch?v=OEUCNh5g-2I
Linus Tech Tips
2020-01-27T20:08:27+00:00
I'm Returning my Mac Pro
https://www.youtube.com/watch?v=mIB389tqzCI
Linus Tech Tips
2020-01-26T19:59:45+00:00
The RGB HDMI cable ISN'T as dumb as you'd think...
https://www.youtube.com/watch?v=nva6oPszm60
Linus Tech Tips
2020-01-25T20:06:23+00:00
I am NOT Retiring... yet - WAN Show Jan 24, 2020
https://www.youtube.com/watch?v=cxjhTVR_dJw
Linus Tech Tips
2020-01-25T02:29:50+00:00
The Best VR Headset... got BETTER!?
https://www.youtube.com/watch?v=AGScX_8plYw
Linus Tech Tips
2020-01-23T19:52:00+00:00
I've been thinking of retiring.
https://www.youtube.com/watch?v=hAsZCTL__lo
Linus Tech Tips
2020-01-23T06:35:25+00:00
It’s time to upgrade your GPU - RX 5600 XT
https://www.youtube.com/watch?v=rKn-vWDMkwQ
Linus Tech Tips
2020-01-22T19:59:36+00:00
WE FINALLY DID IT!! - Water Cooling the 8K Camera!
https://www.youtube.com/watch?v=imJ9QgOJHzY
Linus Tech Tips
2020-01-21T19:59:47+00:00
We Water Cooled an SSD!!
https://www.youtube.com/watch?v=lQmI5A27Iv8
Linus Tech Tips
2020-01-20T20:17:22+00:00
Should you buy a CPU??
https://www.youtube.com/watch?v=JISJ_YTI9s0
Linus Tech Tips
2020-01-19T20:19:02+00:00
Apple’s Pro Display XDR – A PC Guy’s Perspective
https://www.youtube.com/watch?v=X089oYPc5Pg
Linus Tech Tips
2020-01-18T19:59:29+00:00
The NSA is Giving Out It's Hacks for Free! - WAN Show Jan 17, 2020
https://www.youtube.com/watch?v=af6FBA-n7eA
Linus Tech Tips
2020-01-18T03:00:04+00:00
但是,请记住在您的请求中使用 headers 并注意一次访问 YouTube 后端的次数过多,因为您的 IP 将收到 12 小时的临时暂停。祝你好运!
我有一个频道,我想尽快从中获取新上传的视频。最好的方法是什么?我知道的两个选项:
- 使用 YouTube API
- 直接通过url访问频道
对于选项 1,我需要调用 api 才能获得视频列表。由于有配额,我想我会 运行 我可以拨打 api 个电话。我认为选项 2 是最好的选择,因为我可以根据需要随时调用 url。
是否首先通过 api 提供新视频?还是通过 url 访问的视频在不同的时间提供给用户,具体取决于他们来自的地区?我自己构建了一个 url 爬虫。我每分钟访问 url。仍然有人比我早 8 分钟收到视频。我不明白为什么会这样。
您可以为感兴趣的频道尝试 RSS 提要。它包含带有 UTC 时间戳的新鲜视频(因此您提到的时区没有问题)。
频道视频的 RSS link 可以在频道页面的来源中找到。打开页面源码搜索"rssUrl":
只是为了构建一些 MadRay 写的东西,你可以用这个 URL
做一些简单的字符串替换使用频道 ID:
"https://www.youtube.com/feeds/videos.xml?channel_id=UCXuqSBlHAE6Xw-yeJA0Tunw"
使用频道名称:
https://www.youtube.com/feeds/videos.xml?user=LinusTechTips
冒昧给大家解析一下。
from bs4 import BeautifulSoup
import requests
url="https://www.youtube.com/feeds/videos.xml?user=LinusTechTips"
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
for entry in soup.find_all("entry"):
for title in entry.find_all("title"):
print(title.text)
for link in entry.find_all("link"):
print(link["href"])
for name in entry.find_all("name"):
print(name.text)
for pub in entry.find_all("published"):
print(pub.text)
回复:
FINALLY Wireless Headphones that Sound GREAT
https://www.youtube.com/watch?v=rei5vMQmD4Q
Linus Tech Tips
2020-01-30T20:04:37+00:00
Don't give Apple your MONEY - Mac Pro Upgrade Adventure
https://www.youtube.com/watch?v=zcLbSCinX3U
Linus Tech Tips
2020-01-29T19:59:56+00:00
We got the Kick-Proof TV from China!
https://www.youtube.com/watch?v=4eSADWuZskk
Linus Tech Tips
2020-01-28T19:46:09+00:00
Everything went wrong... Water Cooled 8K Camera Final Test
https://www.youtube.com/watch?v=OEUCNh5g-2I
Linus Tech Tips
2020-01-27T20:08:27+00:00
I'm Returning my Mac Pro
https://www.youtube.com/watch?v=mIB389tqzCI
Linus Tech Tips
2020-01-26T19:59:45+00:00
The RGB HDMI cable ISN'T as dumb as you'd think...
https://www.youtube.com/watch?v=nva6oPszm60
Linus Tech Tips
2020-01-25T20:06:23+00:00
I am NOT Retiring... yet - WAN Show Jan 24, 2020
https://www.youtube.com/watch?v=cxjhTVR_dJw
Linus Tech Tips
2020-01-25T02:29:50+00:00
The Best VR Headset... got BETTER!?
https://www.youtube.com/watch?v=AGScX_8plYw
Linus Tech Tips
2020-01-23T19:52:00+00:00
I've been thinking of retiring.
https://www.youtube.com/watch?v=hAsZCTL__lo
Linus Tech Tips
2020-01-23T06:35:25+00:00
It’s time to upgrade your GPU - RX 5600 XT
https://www.youtube.com/watch?v=rKn-vWDMkwQ
Linus Tech Tips
2020-01-22T19:59:36+00:00
WE FINALLY DID IT!! - Water Cooling the 8K Camera!
https://www.youtube.com/watch?v=imJ9QgOJHzY
Linus Tech Tips
2020-01-21T19:59:47+00:00
We Water Cooled an SSD!!
https://www.youtube.com/watch?v=lQmI5A27Iv8
Linus Tech Tips
2020-01-20T20:17:22+00:00
Should you buy a CPU??
https://www.youtube.com/watch?v=JISJ_YTI9s0
Linus Tech Tips
2020-01-19T20:19:02+00:00
Apple’s Pro Display XDR – A PC Guy’s Perspective
https://www.youtube.com/watch?v=X089oYPc5Pg
Linus Tech Tips
2020-01-18T19:59:29+00:00
The NSA is Giving Out It's Hacks for Free! - WAN Show Jan 17, 2020
https://www.youtube.com/watch?v=af6FBA-n7eA
Linus Tech Tips
2020-01-18T03:00:04+00:00
但是,请记住在您的请求中使用 headers 并注意一次访问 YouTube 后端的次数过多,因为您的 IP 将收到 12 小时的临时暂停。祝你好运!