转换后从 Flash 播放器中抓取 mp3 文件
Scraping the mp3 file from flash player after the conversion
页面上有一个textarea
和一个按钮Synthesize
。
看起来如下:
<textarea id="ttstext" name="text" style="font-size: 130%; width: 100%;
height: 120px; padding: 5px;"></textarea>
...
<div id="audioplayer">
<script>
create_playback();
</script><audio autoplay="" autobuffer="" controls=""></audio>
</div>
<input id="commitbtn" value="Synthesize" type="submit">
当我点击按钮synthesize
时,页面的HTML代码将改变如下(它会创建音频播放器)。
<div id="audioplayer" style="display: block;"><embed width="370" height="20" flashvars="height=20&width=370&type=mp3&file=http://services.abc.xyz.mp3&showstop=true&usefullscreen=false&autostart=true" allowfullscreen="true" allowscriptaccess="always" quality="high" name="mpl" id="mpl" style="undefined" src="/demo/mediaplayer.swf" type="application/x-shockwave-flash"></div>
我想从 Python 代码生成 mp3 文件。
到目前为止我已经尝试了什么。
#!/usr/bin/env python
# encoding: utf-8
from __future__ import unicode_literals
from contextlib import closing
from selenium.webdriver import Firefox
from selenium.webdriver.support.ui import WebDriverWait
import BeautifulSoup
import time
url = "http://www..."
def textToSpeech():
with closing(Firefox()) as browser:
try:
browser.get(url)
except selenium.common.exceptions.TimeoutException:
print "timeout"
browser.find_element_by_id("ttstext").send_keys("Hello.")
button = browser.find_element_by_id("commitbtn")
button.click()
time.sleep(10)
WebDriverWait(browser, timeout=100).until(
lambda x: x.find_element_by_id('audioplayer'))
src = browser.page_source
return src
def getAudio(source):
soup = BeautifulSoup.BeautifulSoup(source)
audio = soup.find("div", {"id": "audioplayer"})
return audio.string
if __name__ == "__main__":
print getAudio(textToSpeech())
成功的关键是得到URL生成的mp3文件。
我不知道如何等待脚本更改 HTML(<div id="audioplayer">
的内部文本)。
我的代码 returns None
,因为它更快地得到结果。
在变化的情况下,等待元素是不够的:
WebDriverWait(browser, timeout=100).until(
lambda x: x.find_element_by_id('audioplayer'))
但是你需要等待它改变某个条件,使用ExpectedCondition
。这是为了帮助您入门(未测试):
from selenium.webdriver.support import expected_conditions as EC
wait_text = 'file=http://'
element = WebDriverWait(driver, 10).until(
EC.text_to_be_present_in_element((By.ID, "myDynamicElement"), wait_text)
)
您还可以在此处查看所有预期条件:
http://selenium-python.readthedocs.org/en/latest/api.html?highlight=text_to_be_present_in_element#module-selenium.webdriver.support.expected_conditions
页面上有一个textarea
和一个按钮Synthesize
。
看起来如下:
<textarea id="ttstext" name="text" style="font-size: 130%; width: 100%;
height: 120px; padding: 5px;"></textarea>
...
<div id="audioplayer">
<script>
create_playback();
</script><audio autoplay="" autobuffer="" controls=""></audio>
</div>
<input id="commitbtn" value="Synthesize" type="submit">
当我点击按钮synthesize
时,页面的HTML代码将改变如下(它会创建音频播放器)。
<div id="audioplayer" style="display: block;"><embed width="370" height="20" flashvars="height=20&width=370&type=mp3&file=http://services.abc.xyz.mp3&showstop=true&usefullscreen=false&autostart=true" allowfullscreen="true" allowscriptaccess="always" quality="high" name="mpl" id="mpl" style="undefined" src="/demo/mediaplayer.swf" type="application/x-shockwave-flash"></div>
我想从 Python 代码生成 mp3 文件。
到目前为止我已经尝试了什么。
#!/usr/bin/env python
# encoding: utf-8
from __future__ import unicode_literals
from contextlib import closing
from selenium.webdriver import Firefox
from selenium.webdriver.support.ui import WebDriverWait
import BeautifulSoup
import time
url = "http://www..."
def textToSpeech():
with closing(Firefox()) as browser:
try:
browser.get(url)
except selenium.common.exceptions.TimeoutException:
print "timeout"
browser.find_element_by_id("ttstext").send_keys("Hello.")
button = browser.find_element_by_id("commitbtn")
button.click()
time.sleep(10)
WebDriverWait(browser, timeout=100).until(
lambda x: x.find_element_by_id('audioplayer'))
src = browser.page_source
return src
def getAudio(source):
soup = BeautifulSoup.BeautifulSoup(source)
audio = soup.find("div", {"id": "audioplayer"})
return audio.string
if __name__ == "__main__":
print getAudio(textToSpeech())
成功的关键是得到URL生成的mp3文件。
我不知道如何等待脚本更改 HTML(<div id="audioplayer">
的内部文本)。
我的代码 returns None
,因为它更快地得到结果。
在变化的情况下,等待元素是不够的:
WebDriverWait(browser, timeout=100).until(
lambda x: x.find_element_by_id('audioplayer'))
但是你需要等待它改变某个条件,使用ExpectedCondition
。这是为了帮助您入门(未测试):
from selenium.webdriver.support import expected_conditions as EC
wait_text = 'file=http://'
element = WebDriverWait(driver, 10).until(
EC.text_to_be_present_in_element((By.ID, "myDynamicElement"), wait_text)
)
您还可以在此处查看所有预期条件: http://selenium-python.readthedocs.org/en/latest/api.html?highlight=text_to_be_present_in_element#module-selenium.webdriver.support.expected_conditions