转换后从 Flash 播放器中抓取 mp3 文件

Question

页面上有一个textarea和一个按钮Synthesize。看起来如下：

        <textarea id="ttstext" name="text" style="font-size: 130%; width: 100%;
        height: 120px; padding: 5px;"></textarea>
        ...
        <div id="audioplayer">
            <script>
                create_playback();
            </script><audio autoplay="" autobuffer="" controls=""></audio>
        </div>
        <input id="commitbtn" value="Synthesize" type="submit">

当我点击按钮synthesize时，页面的HTML代码将改变如下（它会创建音频播放器）。

<div id="audioplayer" style="display: block;"><embed width="370" height="20" flashvars="height=20&amp;width=370&amp;type=mp3&amp;file=http://services.abc.xyz.mp3&amp;showstop=true&amp;usefullscreen=false&amp;autostart=true" allowfullscreen="true" allowscriptaccess="always" quality="high" name="mpl" id="mpl" style="undefined" src="/demo/mediaplayer.swf" type="application/x-shockwave-flash"></div>

我想从 Python 代码生成 mp3 文件。

到目前为止我已经尝试了什么。

#!/usr/bin/env python
# encoding: utf-8
from __future__ import unicode_literals
from contextlib import closing
from selenium.webdriver import Firefox
from selenium.webdriver.support.ui import WebDriverWait
import BeautifulSoup
import time

url = "http://www..."

def textToSpeech():
  with closing(Firefox()) as browser:
    try:
      browser.get(url)
    except selenium.common.exceptions.TimeoutException:
      print "timeout"
    browser.find_element_by_id("ttstext").send_keys("Hello.")
    button = browser.find_element_by_id("commitbtn")
    button.click()
    time.sleep(10)
    WebDriverWait(browser, timeout=100).until(
      lambda x: x.find_element_by_id('audioplayer'))
    src = browser.page_source
    return src

def getAudio(source):
  soup = BeautifulSoup.BeautifulSoup(source)
  audio = soup.find("div", {"id": "audioplayer"})
  return audio.string


if __name__ == "__main__":
  print getAudio(textToSpeech())

成功的关键是得到URL生成的mp3文件。我不知道如何等待脚本更改 HTML（<div id="audioplayer"> 的内部文本）。我的代码 returns None，因为它更快地得到结果。

Answer 1

在变化的情况下，等待元素是不够的：

WebDriverWait(browser, timeout=100).until(
      lambda x: x.find_element_by_id('audioplayer'))

但是你需要等待它改变某个条件，使用ExpectedCondition。这是为了帮助您入门（未测试）：

from selenium.webdriver.support import expected_conditions as EC
wait_text = 'file=http://'
element = WebDriverWait(driver, 10).until(
        EC.text_to_be_present_in_element((By.ID, "myDynamicElement"), wait_text)
    )

您还可以在此处查看所有预期条件： http://selenium-python.readthedocs.org/en/latest/api.html?highlight=text_to_be_present_in_element#module-selenium.webdriver.support.expected_conditions

转换后从 Flash 播放器中抓取 mp3 文件

Scraping the mp3 file from flash player after the conversion

python

selenium

beautifulsoup

web-scraping