如何摆脱 python 请求中的 .webm 链接

Question

我有这段代码可以从一个网站获取视频的 link，该网站随机生成不同格式的视频（如 .webm 和 mp4），尽管 link 并不总是有效，但我希望 python 检查 link 是否包含视频（如果可能，则为 idk），如果输出中的 link 为 .webm 格式，则重做该过程。

P.S：不是把.webm替换到.mp4，而是重做整个过程如果视频是 .webm!

import json
import requests

data = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1').json()
for item in data['response']['items']:
    url = data['response']['items'][0]['url']
    print(url)

我得到的输出：

https://2ch.hk/b/src/261535136/16424379947461.webm （不好输出格式我不需要）

https://2ch.hk/b/src/263391417/16451696588520.mp4 （good输出我需要的格式）

谢谢！

Answer 1

除非找到“有效”（有用？）URL：

，否则您可以创建一个标志来停止 while 循环

redo_process = True
while redo_process:
    data = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1').json()
    for item in data["response"]["items"]:
        url = data["response"]["items"][0]["url"]
        if url.endswith(".webm"):
            # If we don't want this result, just break the loop and start again
            break
        # If this is what we want, just keep going and we'll stop the process at the end
        redo_process = False
        print(url)

Answer 2

只保存所有链接 endswith() mp4:

mp4_links = [link["url"] for link in data if link.get("url").endswith("mp4")]
print(mp4_links)

Answer 3

你可以这样做。这使用 while loop to keep us in a loop, requesting random videos from the API, and looping over the videos returned from the API, until we come across one that matches the desired condition: not url.<a href="https://docs.python.org/3/library/stdtypes.html#str.endswith" rel="nofollow noreferrer">endswith</a>('.mp4').

import json
import requests

url = ''
while not url.endswith('.mp4'):

    data = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1').json()

    for item in data['response']['items']:
        url = item['url']
        if url.endswith('.mp4'):
            break
print(url)

我修复了 url = data['response']['items'][0]['url'] 将始终引用 api.randomtube.xyz 中的第一个 URL 的错误。相反，您想使用 url = item['url'] 从循环的当前项中引用 URL。

为了进一步改进，我们可以将它放入一个感觉更好的函数中。让我试着用语言表达为什么感觉更好：

我们有我们的 re-usable get_random_mp4_video 功能，即使我们只需要使用它一次，只是将它分开有助于我们轻松理解正在发生的事情阅读代码时的表面水平。
当我们 return 退出函数时，我们会自动停止执行两个循环，这允许我们删除对 while 循环的检查。现在我们不再需要在两个不同的地方检查 URL，我们可以防止任何我们可能更新一个条件而忘记另一个条件的问题。
这也使代码更易于阅读。当我们查看每个缩进处发生的情况时，我们一眼就能发现有一个无限循环、另一个循环和 mp4 检查，这会让我们很快明白我们正在执行一个无限循环，直到我们得到 .mp4 文件.

def get_random_mp4_video():

    while True:
        data = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1').json()

        for item in data['response']['items']:
            url = item['url']
            if url.endswith('.mp4'):
                return url

打印第一个 URL 未给出错误的示例用法：

while True:
    url = get_random_mp4_video()
    resp = requests.get(url)
    # Check that we don't get a 404 or other error when visiting the URL
    if resp.ok:
        break
print(url)

如果说 api.randomtube.xyz 出现故障或发生其他事情使我们陷入循环，那么在一定数量的尝试后可以通过退出来进一步完善代码以防止无限循环，但我会离开那给你实施额外的保护。您只需要添加一个基本计数器并在计数太高时引发错误。

奖金

返回多个视频

如果您想 return 多个视频，而不必多次调用 get_random_mp4_video，因为要等待对 api.randomtube.xyz 的大量调用，效率会很低，下面的代码是这样做的一种方式。我正在使用两个功能。外部函数就是你所说的，你告诉它你想要多少个视频。内部 _get_vids 函数从 api.randomtube.xyz 请求中收集所有 mp4 视频。外部函数将不断调用内部函数，直到它有足够的视频来满足您要求的视频数量。

def get_random_mp4_videos(how_many):

    def _get_vids():
        ret = []
        data = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1').json()

        for item in data['response']['items']:
            url = item['url']
            if url.endswith('.mp4'):
                ret.append(url)
        return ret

    vids = []
    while len(vids) < how_many:
        vids.extend(_get_vids())

    # we may have more videos than we need so return the first how_many videos
    return vids[:how_many]

Answer 4

要解决的问题：

endswith(part) 在字符串 returns True 上，如果字符串以给定的部分结尾，例如扩展名 .mp4
直接使用迭代变量item而不是总是所有项目的第一个元素data['response']['items'][0]

for item in data['response']['items']:
     url = item['url']
     if url.endswith('.mp4'):
         print(url)

打印的不是 6 个 (.webm)，而是总共 29 个 URL 中的 23 个。以所需扩展名 .mp4.

结尾的那些

精炼

为了只得到第一个 .mp4 我会把它包装成一个函数，然后 return 在第一个找到的 MP4 link.

def firstMp4(json):
     for item in json['items']:
         url = item['url']
         if url.endswith('.mp4'):
             return url
     return None  # default if no mp4 found


data = requests.get('https://api.randomtube.xyz/video.get?chan=2ch.hk&board=b&page=1').json()

mp4 = firstMp4(data['response'])
print(f"First mp4 or None: {mp4}")

if mp4 is None:
    print("No MP4 found.")

如何摆脱 python 请求中的 .webm 链接

How to get rid of .webm links in python request

python

python-requests

奖金

精炼