运行 2019 年使用（DeepVoice | WaveNet | 等）的深度学习 TTS

Question

我正在尝试将 txt 文件中的一系列句子转换为声音尽可能清晰的 WAV 文件。

根据2019 survey，最近有许多使用深度学习技术的进步。

这是个好消息，因为内置或常用的文本转语音引擎听起来非常机器人化。（OSX 的 "say" 命令、espeak 等）。

问题是，github 页面或协作笔记本链接的重点是如何训练新模型或设置 docker 实例，似乎没有包含最小

git clone ...
./speak "How are you doing?" -o hayd.wav

你知道如何安装和运行那篇文章中的任何 2019 引擎说一句话吗？

我会更新 if/when 我找到一个有用的。

Answer 1

我不知道列表中的任何其他人，但对于 WaveNet，您可以使用 Google 的 API. Your code sends the text to Google, and they return the audio. There are client libraries available for C#, Go, Java, Node.js, PHP, Python, and Ruby. If you want to do it from another language you could use the REST API. For WaveNet, the first 1 million characters per month are free. After that it is per 1 million characters. See their pricing page。

如果您的项目是一个相对较小的一次性项目，并且您不介意以编程方式进行（问题中不清楚），那么您可以使用他们的 online demo page and use a browser add-on (e.g. Video DownloadHelper or one of many others) to download the results as audio files. Alternatively you could use the API on the command line.

在我看来，WaveNet 的质量非常好，并且是对前几代文本到语音算法的巨大改进。您有时几乎相信这些声音是真实的。

运行 2019 年使用（DeepVoice | WaveNet | 等）的深度学习 TTS

Running a deep learning TTS in 2019 using (DeepVoice | WaveNet | etc)

text-to-speech

deep-learning