有没有办法通过 Firefox/Chrome 获取 HTTP 实时流式传输 (HLS) 内容？

Question

我正在使用 Selenium 和 BeautifulSoup 抓取视频源。我想问一下是否有办法用 Firefox 或 Chrome 提取 m3u8 文件（HLS 内容）而不是 blob 文件？

以下代码使用 Selenium Safari 网络驱动程序将视频源抓取为播放列表字符串。

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver
from bs4 import BeautifulSoup
import re
import urllib.request


def get_all_channels(base: str="https://www.telewebion.com/channels"):
    channels_url = urllib.request.urlopen(f"{base}")
    soup_channels_url = BeautifulSoup(channels_url, "lxml")

    # create a list of all channels
    all_channels_list = []
    for a in soup_channels_url.select('.no-featured a'):
        all_channels_list.append(a['href'])
        # all_channels_list.append(a['href'], a.get_text(strip=True))

    # return the list
    return all_channels_list


def get_video_src(url: str, base: str="https://www.telewebion.com"):
    channel_url = f"{base}{url}"

    wd = webdriver.Safari()
    # wd = webdriver.Chrome()
    # wd = webdriver.Firefox(executable_path='/usr/local/bin/geckodriver')

    wd.get(channel_url)
    WebDriverWait(wd, 6000).until(EC.visibility_of_element_located(
        (By.CLASS_NAME, "position-relative")))

    html_page = wd.page_source

    # Now use html_page
    soup = BeautifulSoup(html_page, "lxml")

    video = soup.find_all("video", class_="rmp-object-fit-contain")
    video_src = video[0]['src']

    wd.quit()

    return video_src

for channel in get_all_channels():
    print(get_video_src(channel))

结果是我感兴趣的 m3u8 播放列表（HLS 内容）字符串，但它不是可扩展的解决方案，因为仅在安装 Safari 时才有效。 Firefox/Chrome Selenium 的 Web 驱动程序 return 而不是 blob 字符串。我的最终目标是下载扩展的 M3U (m3u8) 播放列表（或任何其他类型的视频流）而不是视频流的块，以便用作 Kodi 附加视频源。

P.S。视频源是动态的，由 JavaScript 渲染以加载其内容；因此我使用了 Selenium 来调用浏览器。

Answer 1

我认为您不需要使用 selenium 来获取频道列表或频道链接。

步骤：您可以使用您喜欢的任何编程语言。

 1. Get All channels:
Make a get request to this url to get all the channels.
https://wa1.telewebion.com/v2/channels/getChannels?logo_version=4&thumb_size=240

If you look at the response. "data" is an array of channel that has attribute called "descriptor" which gives us value of "channel_desc" for next request

 2. Get channel links:
Make a get request to using link below to get all links of channel from first request
https://wa1.telewebion.com/v2/channels/getChannelLinks?channel_desc=tv1&device=desktop&logo_version=4

The channel desc value "tv1" was received from first call.
On the response if you look at the links on data you will see all the m3u8 urls to for the tv1 channel. 

 3. Now you can use https://github.com/carlanton/m3u8-parser 
 to parse the m3u8 file to get the playlist urls or segment urls on the master or media manifests.

您可以在此处阅读有关 m3u8 规范的信息：https://datatracker.ietf.org/doc/html/draft-pantos-http-live-streaming-08

有没有办法通过 Firefox/Chrome 获取 HTTP 实时流式传输 (HLS) 内容？

Is there a way to get HTTP Live Streaming (HLS) content with Firefox/Chrome?

python

selenium

blob

beautifulsoup

http-live-streaming